Models
CogVLM vs. LLaVA

CogVLM vs. LLaVA

Both CogVLM and LLaVA-1.5 are commonly used in computer vision projects. Below, we compare and contrast CogVLM and LLaVA-1.5.

Models

icon-model

CogVLM

CogVLM shows strong performance in Visual Question Answering (VQA) and other vision tasks.
icon-model

LLaVA-1.5

LLaVA is an open source multimodal language model that you can use for visual question answering and has limited support for object detection.
Model Type
Multimodal Model
--
Object Detection
--
Model Features
Item 1 Info
Item 2 Info
Architecture
--
--
Frameworks
PyTorch
--
--
Annotation Format
Instance Segmentation
Instance Segmentation
GitHub Stars
4.7k+
--
16,000
--
License
Apache-2.0
--
Apache-2.0
--
Training Notebook
Compare Alternatives

Compare CogVLM and LLaVA-1.5 with Autodistill

We ran seven tests across five state-of-the-art Large Multimodal Models (LMMs) on November 23rd, 2023. CogVLM passed at five of seven tests and LLaVA passed at one of seven tests. Here are the results:

Based on our tests, CogVLM performs better across different multimodal tasks than LLaVA.

Read more of our analysis.