Models
LLaVA vs. Kosmos-2

LLaVA vs. Kosmos-2

Both LLaVA-1.5 and Kosmos-2 are commonly used in computer vision projects. Below, we compare and contrast LLaVA-1.5 and Kosmos-2.

Models

icon-model

LLaVA-1.5

LLaVA is an open source multimodal language model that you can use for visual question answering and has limited support for object detection.
icon-model

Kosmos-2

Kosmos-2 is a multimodal language model capable of object detection and grounding text in images.
Model Type
Object Detection
--
Object Detection
--
Model Features
Item 1 Info
Item 2 Info
Architecture
--
--
Frameworks
--
--
Annotation Format
Instance Segmentation
Instance Segmentation
GitHub Stars
16,000
--
--
License
Apache-2.0
--
--
Training Notebook
Compare Alternatives

Compare LLaVA-1.5 and Kosmos-2 with Autodistill

Using Autodistill, you can compare LLaVA and Kosmos-2 on your own images in a few lines of code.

Here is an example comparison:

To start a comparison, first install the required dependencies:


pip install autodistill autodistill-llava autodistill-kosmos-2

Next, create a new Python file and add the following code:


from autodistill_kosmos2 import Kosmos2
from autodistill_llava import LLaVA

from autodistill.detection import CaptionOntology
from autodistill.utils import compare

ontology = CaptionOntology(
    {
        "solar panel": "solar panel",
    }
)

models = [
    Kosmos2(ontology=ontology),
    LLaVA(ontology=ontology)
]

images = [
    "/home/user/autodistill/solarpanel1.jpg",
    "/home/user/autodistill/solarpanel2.jpg"
]

compare(
    models=models,
    images=images
)

Above, replace the images in the `images` directory with the images you want to use.

The images must be absolute paths.

Then, run the script.

You should see a model comparison like this:

When you have chosen a model that works best for your use case, you can auto label a folder of images using the following code:


base_model.label(
  input_folder="./images",
  output_folder="./dataset",
  extension=".jpg"
)

Compare LLaVA vs. Kosmos-2

Provide your own image below to test YOLOv8 and YOLOv9 model checkpoints trained on the Microsoft COCO dataset.

COCO can detect 80 common objects, including cats, cell phones, and cars.