Try the Model

Use the widget below to experiment with QwenVL. You can detect COCO classes such as people, vehicles, animals, household items.

Overview

Qwen-VL is an LMM (large multimodal model) developed by Alibaba Cloud. Qwen-VL accepts images, text, and bounding boxes as inputs. The model can output text and bounding boxes. Qwen-VL naturally supports English, Chinese, and multilingual conversation. Thus, this model may be worth exploring if you have a use case where you expect Chinese and English to be used in prompts or answers. Currently, Qwen-VL outperforms both ChatGPT and Gemini on Chinese question answering tasks, which is a massive milestone for the open-source model.

‍

QwenVL License

QwenVL

is licensed under a

Tongyi Qianwen

license.

Performance

‍

Qwen-VL achieves state-of-the-art results on zero shot captioning, where it outperforms previous generalist models on 4 benchmarks. Despite its relatively small size, due to its higher image resolution processing, it is able to compete with slightly larger state-of-the-art models.

Deploy a QwenVL API

You can use Roboflow Inference to deploy a

QwenVL

API on your hardware. You can deploy the model on CPU (i.e. Raspberry Pi, AI PCs) and GPU devices (i.e. NVIDIA Jetson, NVIDIA T4).

Below are instructions on how to deploy your own model API.

Label Data Automatically with QwenVL

You can automatically label a dataset using QwenVL with help from Autodistill, an open source package for training computer vision models. You can label a folder of images automatically with only a few lines of code. Below, see our tutorials that demonstrate how to use QwenVL to train a computer vision model.

No items found.