How to smooth YOLO-World detections

When you are working with videos and live streams, you may want to smooth predictions between frames. This will help prevent any missed predictions in a frame from impacting post-processing logic like object tracking.

You can smooth

YOLO-World

predictions in a few lines of code using the supervison Python package. We will show how in this guide.

To smooth detections, we will:

1. Install supervision
2. Use the sv.Smoother() method to smooth detections

Without further ado, let's get started!

Step #1: Install supervision

First, install the supervision pip package:

pip install supervision


Once you have installed supervision, you are ready to load your data and start writing logic to filter detections.

Step #2: Smooth Detections

The sv.Smoother() class records supervision detections across frames. It then applies logic to ensure that detections across multiple frames are consistent.

To use sv.Smoother(), you need to apply an object tracking algorithm. This will allow supervision to track the locations of bounding boxes, thereby allowing for accurate smoothing.

To smooth detections with supervision, use the following code:


from inference.models.yolo_world.yolo_world import YOLOWorld
import supervision as sv
import cv2

model = YOLOWorld( model_id="yolo_world/l")

classes = ["person"]

model.set_classes(classes)

detections = sv.Detections.from_inference(results)
tracker = sv.ByteTrack(frame_rate=video_info.fps)
smoother = sv.DetectionsSmoother()

annotator = sv.BoundingBoxAnnotator()

with sv.VideoSink("TARGET_FILE_PATH", video_info=video_info) as sink:
    for frame in frame_generator:
        results = model.infer(frame, text=classes)
        detections = sv.Detections.from_inference(result)
        detections = tracker.update_with_detections(detections)
        detections = smoother.update_with_detections(detections)

        annotated_frame = bounding_box_annotator.annotate(frame.copy(), detections)
        sink.write_frame(annotated_frame)

The code above is designed to work with video files. You can update the logic to run on streams in any callbacks you define for your stream.

Next steps

supervision provides an extensive range of functionalities for working with computer vision models. With supervision, you can:

1. Process and filter detections and segmentation masks from a range of popular models (YOLOv5, Ultralytics YOLOv8, MMDetection, and more).
2. Process and filter classifications.
3. Compute confusion matrices.

And more! To learn about the full range of functionality in supervision, check out the supervision documentation.

Learn how to smooth detections for other models

Below, you can find our guides on how to smooth detections for other computer vision models.