Try the Model

Use the widget below to experiment with YOLO-World. You can detect COCO classes such as people, vehicles, animals, household items.

Overview

‍

YOLO-World, introduced in the research paper “YOLO-World: Real-Time Open-Vocabulary Object Detection”, shows a significant advancement in the field of open-vocabulary object detection by demonstrating that lightweight detectors, such as those from the YOLO series, can achieve strong open-vocabulary performance. This is particularly noteworthy for real-world applications where efficiency and speed are crucial, like edge applications. In the following image, YOLO-World demonstrates a 20x speedup from previous models, while also keeping similar accuracy, which makes it heavily applicable to real-time applications.

‍

YOLO-World has grounding capabilities and can understand the context in a prompt to provide detections. You do not need to train the model on a particular class because the model has been trained using image-text pairs and grounded images. The model has learned how to take an arbitrary prompt – for example, “person wearing a white shirt” – and use that for detection.

YOLO-World exclusively supports object detection.

‍

YOLO-World is supported on autodistill and inference.

YOLO-World License

YOLO-World

is licensed under a

GPL-3.0

license.

Performance

According to the paper YOLO-World reached between 35.4 AP with 52.0 FPS for the large version and 26.2 AP with 74.1 FPS for the small version. While the V100 is a powerful GPU, achieving such high FPS on any device is impressive.

Deploy a YOLO-World API

You can use Roboflow Inference to deploy a

YOLO-World

API on your hardware. You can deploy the model on CPU (i.e. Raspberry Pi, AI PCs) and GPU devices (i.e. NVIDIA Jetson, NVIDIA T4).

Below are instructions on how to deploy your own model API.

First, install Inference:


pip install inference

Run the following command to set your API key in your coding environment:

export ROBOFLOW_API_KEY=<your api key>

Then, create a new Python file called app.py and add the following code:


import cv2
import supervision as sv

from inference.models.yolo_world.yolo_world import YOLOWorld

image = cv2.imread("image.jpeg")

model = YOLOWorld(model_id="yolo_world/l")
classes = ["person", "backpack", "dog", "eye", "nose", "ear", "tongue"]
results = model.infer("image.jpeg", text=classes, confidence=0.03)

detections = sv.Detections.from_inference(results[0])

bounding_box_annotator = sv.BoundingBoxAnnotator()
label_annotator = sv.LabelAnnotator()

labels = [classes[class_id] for class_id in detections.class_id]

annotated_image = bounding_box_annotator.annotate(
    scene=image, detections=detections
)
annotated_image = label_annotator.annotate(
    scene=annotated_image, detections=detections, labels=labels
)

sv.plot_image(annotated_image)

Label Data Automatically with YOLO-World

You can automatically label a dataset using YOLO-World with help from Autodistill, an open source package for training computer vision models. You can label a folder of images automatically with only a few lines of code. Below, see our tutorials that demonstrate how to use YOLO-World to train a computer vision model.

No items found.