Kosmos-2 is a multimodal language model capable of object detection and grounding text in images.
Overview
Kosmos-2 is a multimodal language model capable of object detection and grounding text in images.
Performance
Use This Model
Label Data Automatically with Kosmos-2
You can automatically label a dataset using Kosmos-2 with help from Autodistill, an open source package for training computer vision models. You can label a folder of images automatically with only a few lines of code. Below, see our tutorials that demonstrate how to use Kosmos-2 to train a computer vision model.
YOLOv8 uses the uses the YOLOv8 PyTorch TXT annotation format. If your annotation is in a different format, you can use Roboflow's annotation conversion tools to get your data into the right format.