How to detect small objects with YOLOv9 and SAHI

Before you can train a computer vision model, you need labeled data on which to train your model. The more accurate the labels, or annotations, are, the higher the performance the model will achieve.

Overview

Slicing Aided Hyper Inference (SAHI) is a technique to improve small object detection performance with computer vision models. SAHI cuts an image into smaller images then runs inference on each smaller image. Predictions are then aggregated back together.

In this guide, we are going to walk through how to use SAHI with

YOLOv9

to improve your ability to detect small objects with a vision model.

To use SAHI with YOLOv9, we will:

  1. Install supervision
  2. Load a model
  3. Run inference using the sv.InferenceSlicer object

Let's get started!

YOLOv9 and Image Annotation Resources

Explore these resources to enhance your understanding of YOLOv9 and image annotation techniques.

Install supervision

First, install the supervision pip package:

pip install supervision


Once you have installed supervision, you are ready to load your data and start writing logic to filter detections.

Load Model

First, we are going to load a model for use in running inference. For this guide, we will use a

YOLOv9

model. We will then define a function that will run inference on an image and load the data into an sv.Detections object.

Let's load our model then define a function that, given an image, will run inference:


import cv2
import inference
import supervision as sv

annotator = sv.BoxAnnotator()

def render(predictions, image):
    classes = {item["class_id"]: item["class"] for item in predictions["predictions"]}

    detections = sv.Detections.from_roboflow(predictions)

    print(predictions)

    image = annotator.annotate(
        scene=image, detections=detections, labels=[classes[i] for i in detections.class_id]
    )

    cv2.imshow("Prediction", image)
    cv2.waitKey(1)


inference.Stream(
    source="webcam",
    model="microsoft-coco/9",
    output_channel_order="BGR",
    use_main_thread=True,
    on_prediction=render,
    api_key="api_key"
)

Above, replace "microsoft-coco/9" with the model ID of a YOLOv9 model hosted on Roboflow.

To upload a model to Roboflow, first install the Roboflow Python package:

pip install roboflow

Then, create a new Python file and paste in the following code:


from roboflow import Roboflow

rf = Roboflow(api_key="API_KEY")
project = rf.workspace().project("PROJECT_ID")
project.version(DATASET_VERSION).deploy(model_type="yolov8", model_path=f"{HOME}/runs/detect/train/")

In the code above, add your API key and the path to the model weights you want to upload. Learn how to retrieve your API key. Your weights will be uploaded to Roboflow. Your model will shortly be accessible over an API, and available for use in Inference. To learn more about uploading model weights to Roboflow, check out our full guide to uploading weights to Roboflow.

Replace the ... with the response object from your model.

Run Inference with sv.InferenceSlicer

The sv.InferenceSlicer object takes a callback function that returns an sv.Detections object. The slicer divides a provided image into smaller parts, runs inference on each, then combines the results into a single sv.Detections object. We can process the Detections object using supervision to accomplish tasks like plotting bounding boxes and filtering predictions.


slicer = sv.InferenceSlicer(callback=callback)
detections = slicer(image=image)

prediction_num = len(sliced_detections.xyxy)

box_annotator = sv.BoxAnnotator()

annotated_frame = box_annotator.annotate(
	scene=image.copy(),
	detections=detections,
	labels=labels
)

sv.plot_image(image=annotated_frame, size=(16, 16))

The above code uses SAHI to process an image then plots the results from inference on an image. This image is then displayed.

You can try SAHI on an example image using a model trained on the Microsoft COCO dataset. The model can detect common objects like car and cell phone, useful for visualizing how SAHI impacts model predictions.

Next Steps

supervision provides an extensive range of functionalities for working with computer vision models. With supervision, you can:

1. Process and filter detections and segmentation masks from a range of popular models (YOLOv5, Ultralytics YOLOv8, MMDetection, and more).
2. Process and filter classifications.
3. Compute confusion matrices.

And more! To learn about the full range of functionality in supervision, check out the supervision documentation.