Models

LLaVA vs. Kosmos-2

Both

LLaVA

and

Kosmos-2

are commonly used in computer vision projects. Below, we compare and contrast

LLaVA

and

Kosmos-2

.

  LLaVA Kosmos-2
Date of Release
Model Type Object Detection Object Detection
Architecture
GitHub Stars

Compare LLaVA and Kosmos-2 with Autodistill

Using Autodistill, you can compare LLaVA and Kosmos-2 on your own images in a few lines of code.

Here is an example comparison:

To start a comparison, first install the required dependencies:


pip install autodistill autodistill-llava autodistill-kosmos-2

Next, create a new Python file and add the following code:


from autodistill_kosmos2 import Kosmos2
from autodistill_llava import LLaVA

from autodistill.detection import CaptionOntology
from autodistill.utils import compare

ontology = CaptionOntology(
    {
        "solar panel": "solar panel",
    }
)

models = [
    Kosmos2(ontology=ontology),
    LLaVA(ontology=ontology)
]

images = [
    "/home/user/autodistill/solarpanel1.jpg",
    "/home/user/autodistill/solarpanel2.jpg"
]

compare(
    models=models,
    images=images
)

Above, replace the images in the `images` directory with the images you want to use.

The images must be absolute paths.

Then, run the script.

You should see a model comparison like this:

When you have chosen a model that works best for your use case, you can auto label a folder of images using the following code:


base_model.label(
  input_folder="./images",
  output_folder="./dataset",
  extension=".jpg"
)

LLaVA

LLaVA is an open source multimodal language model that you can use for visual question answering and has limited support for object detection.

How to AugmentHow to LabelHow to Plot PredictionsHow to Filter PredictionsHow to Create a Confusion Matrix

Kosmos-2

Kosmos-2 is a multimodal language model capable of object detection and grounding text in images.

How to AugmentHow to LabelHow to Plot PredictionsHow to Filter PredictionsHow to Create a Confusion Matrix

Compare LLaVA to other models

Compare Kosmos-2 to other models

Deploy a computer vision model today

Join 250,000 developers curating high quality datasets and deploying better models with Roboflow.

Get started