Models
QwenVL vs. LLaVA

QwenVL vs. LLaVA

Both QwenVL and LLaVA-1.5 are commonly used in computer vision projects. Below, we compare and contrast QwenVL and LLaVA-1.5.

Models

icon-model

QwenVL

Qwen-VL is an LMM developed by Alibaba Cloud. Qwen-VL accepts images, text, and bounding boxes as inputs. The model can output text and bounding boxes. Qwen-VL naturally supports English, Chinese, and multilingual conversation.
icon-model

LLaVA-1.5

LLaVA is an open source multimodal language model that you can use for visual question answering and has limited support for object detection.
Model Type
Multimodal Model
--
Object Detection
--
Model Features
Item 1 Info
Item 2 Info
Architecture
--
--
Frameworks
--
--
Annotation Format
Instance Segmentation
Instance Segmentation
GitHub Stars
3.3k+
--
16,000
--
License
Tongyi Qianwen
--
Apache-2.0
--
Training Notebook
Compare Alternatives

Compare QwenVL and LLaVA-1.5 with Autodistill

We ran seven tests across five state-of-the-art Large Multimodal Models (LMMs) on November 23rd, 2023. QwenVL passed at five of seven tests and LLaVA passed at one of seven tests. Here are the results:

Based on our tests, QwenVL performs better across different multimodal tasks than LLaVA.

Read more of our analysis.

Download the raw image results from our analysis.