Qwen-VL is an LMM developed by Alibaba Cloud. Qwen-VL accepts images, text, and bounding boxes as inputs. The model can output text and bounding boxes. Qwen-VL naturally supports English, Chinese, and multilingual conversation.
Overview
Qwen-VL is an LMM (large multimodal model) developed by Alibaba Cloud. Qwen-VL accepts images, text, and bounding boxes as inputs. The model can output text and bounding boxes. Qwen-VL naturally supports English, Chinese, and multilingual conversation. Thus, this model may be worth exploring if you have a use case where you expect Chinese and English to be used in prompts or answers. Currently, Qwen-VL outperforms both ChatGPT and Gemini on Chinese question answering tasks, which is a massive milestone for the open-source model.
Performance
Qwen-VL achieves state-of-the-art results on zero shot captioning, where it outperforms previous generalist models on 4 benchmarks. Despite its relatively small size, due to its higher image resolution processing, it is able to compete with slightly larger state-of-the-art models.
Use This Model
Label Data Automatically with QwenVL
You can automatically label a dataset using QwenVL with help from Autodistill, an open source package for training computer vision models. You can label a folder of images automatically with only a few lines of code. Below, see our tutorials that demonstrate how to use QwenVL to train a computer vision model.
No items found.
Deploy to Production
Roboflow offers a range of SDKs with which you can deploy your model to production.
YOLOv8 uses the uses the YOLOv8 PyTorch TXT annotation format. If your annotation is in a different format, you can use Roboflow's annotation conversion tools to get your data into the right format.