When you are working with videos and live streams, you may want to smooth predictions between frames. This will help prevent any missed predictions in a frame from impacting post-processing logic like object tracking.
You can smooth
Transformers
predictions in a few lines of code using the supervison Python package. We will show how in this guide.
To smooth detections, we will:
1. Install supervision
2. Use the sv.Smoother() method to smooth detections
Without further ado, let's get started!
First, install the supervision pip package:
pip install supervision
Once you have installed supervision, you are ready to load your data and start writing logic to filter detections.
The sv.Smoother() class records supervision detections across frames. It then applies logic to ensure that detections across multiple frames are consistent.
To use sv.Smoother(), you need to apply an object tracking algorithm. This will allow supervision to track the locations of bounding boxes, thereby allowing for accurate smoothing.
To smooth detections with supervision, use the following code:
import supervision as sv
# initialize your Transformers model here
model = ...
tracker = sv.ByteTrack(frame_rate=video_info.fps)
smoother = sv.DetectionsSmoother()
annotator = sv.BoundingBoxAnnotator()
with sv.VideoSink("TARGET_FILE_PATH", video_info=video_info) as sink:
for frame in frame_generator:
# add your Transformers inference logic here
result = ...
detections = sv.Detections.from_transformers(result)
detections = tracker.update_with_detections(detections)
detections = smoother.update_with_detections(detections)
annotated_frame = bounding_box_annotator.annotate(frame.copy(), detections)
sink.write_frame(annotated_frame)
The code above is designed to work with video files. You can update the logic to run on streams in any callbacks you define for your stream.
supervision provides an extensive range of functionalities for working with computer vision models. With supervision, you can:
1. Process and filter detections and segmentation masks from a range of popular models (YOLOv5, Ultralytics YOLOv8, MMDetection, and more).
2. Process and filter classifications.
3. Compute confusion matrices.
And more! To learn about the full range of functionality in supervision, check out the supervision documentation.
Below, you can find our guides on how to smooth detections for other computer vision models.