How to use YOLOv9 with an RTSP Stream

Before you can train a computer vision model, you need labeled data on which to train your model. The more accurate the labels, or annotations, are, the higher the performance the model will achieve.

Overview

Real Time Streaming Protocol (RTSP) is a protocol commonly used to stream video from internet-connected cameras. With supervision and Roboflow Inference, you can run a range of different models using the output of an RTSP stream in a few lines of code.

In this guide, we are going to show you how to run

YOLOv9

on frames from an RTSP camera.

To run a computer vision model on an RTSP stream, we will:

  1. Install supervision and Inference
  2. Use the InferencePipeline method to run inference
  3. Test the model

Let's get started!

XXX and Image Annotation Resources

Explore these resources to enhance your understanding of XXX and image annotation techniques.

Install supervision and Inference

For this tutorial, you will need two packages: supervision and Inference. You can install them using the following command:

pip install supervision inference


Once you have installed supervision and Inference, you are ready to start writing logic to use an RTSP stream with your model

Configure InferencePipeline

The InferencePipeline method allows you to stream data from a webcam or RTSP steam for use in running predictions. The method allows you to select a model for use then run a callback function that has the predictions from the model and the frame on which inference was inferred.

Below, we show you how to use InferencePipeline with

YOLOv9

.

You can load data using the following code:


# import the InferencePipeline interface
from inference import InferencePipeline
# import a built-in sink called render_boxes (sinks are the logic that happens after inference)
from inference.core.interfaces.stream.sinks import render_boxes

api_key = "YOUR_ROBOFLOW_API_KEY"

# create an inference pipeline object
pipeline = InferencePipeline.init(
    model_id="yolov9s-640", # set the model id
    video_reference=0, # set the video reference (source of video), it can be a link/path to a video file, an RTSP stream url, or an integer representing a device id (usually 0 for built in webcams)
    on_prediction=render_boxes, # tell the pipeline object what to do with each set of inference by passing a function
    api_key=api_key, # provide your roboflow api key for loading models from the roboflow api
)
# start the pipeline
pipeline.start()
# wait for the pipeline to finish
pipeline.join()

Above, replace "microsoft-coco/9" with the model ID of a YOLOv9 object detection model hosted on Roboflow.

To upload a model to Roboflow, first install the Roboflow Python package:

pip install roboflow

Then, create a new Python file and paste in the following code:


from roboflow import Roboflow

rf = Roboflow(api_key="API_KEY")
project = rf.workspace().project("PROJECT_ID")
project.version(DATASET_VERSION).deploy(model_type="yolov8", model_path=f"{HOME}/runs/detect/train/")

In the code above, add your API key and the path to the model weights you want to upload. Learn how to retrieve your API key. Your weights will be uploaded to Roboflow. Your model will shortly be accessible over an API, and available for use in Inference. To learn more about uploading model weights to Roboflow, check out our full guide to uploading weights to Roboflow.

Above, we load a model then pass the model into the InferencePipeline method for use in running inference. We define a callback function called render() which runs every time a frame is retrieved from our webcam. render() can contain any logic you want to run on each frame.

In the example code above, we plot predictions from a model each frame and display the frame in a video stream. This allows you to watch your model run in real time and understand how it performs.

Replace the 127.0.0.0 URL with the URL of your RTSP camera. In addition, replace the API_KEY value with your Roboflow API key. Learn how to retrieve your Roboflow API key.

Test the Stream

Now that you have configured your model and streaming interface, you can test the stream. To do so, run your Python program.

Next Steps

supervision provides an extensive range of functionalities for working with computer vision models. With supervision, you can:

1. Process and filter detections and segmentation masks from a range of popular models (YOLOv5, Ultralytics YOLOv8, MMDetection, and more).
2. Display predictions (i.e. bounding boxes, segmentation masks).
3. Annotate images (i.e. trace predictions, draw heatmaps).
4. Compute confusion matrices.

And more! To learn about the full range of functionality in supervision, check out the supervision documentation.