Try the Model

Use the widget below to experiment with Kosmos-2. You can detect COCO classes such as people, vehicles, animals, household items.

Overview

Kosmos-2 is a multimodal language model capable of object detection and grounding text in images.

Kosmos-2 License

Kosmos-2

is licensed under a

license.

Performance

Deploy a Kosmos-2 API

You can use Roboflow Inference to deploy a

Kosmos-2

API on your hardware. You can deploy the model on CPU (i.e. Raspberry Pi, AI PCs) and GPU devices (i.e. NVIDIA Jetson, NVIDIA T4).

Below are instructions on how to deploy your own model API.

Label Data Automatically with Kosmos-2

You can automatically label a dataset using Kosmos-2 with help from Autodistill, an open source package for training computer vision models. You can label a folder of images automatically with only a few lines of code. Below, see our tutorials that demonstrate how to use Kosmos-2 to train a computer vision model.

Kosmos2