How to crop PaliGemma detections

Learn how to crop model predictions with supervision, an open source computer vision Python package with utilities for working with vision model results.

Overview

A common task in working with computer vision models is visualizing model predictions. Being able to qualitatively visualize predictions is useful in in model development, testing, and work to prepare a model for production.

Using the supervision Python package, you can plot and visualize

PaliGemma

predictions in a few lines of code. In this guide, we will show how to plot and visualize model predictions.

We will:

1. Install supervision
2. Load data
3. Plot predictions with a supervision Annotator

Without further ado, let's get started!

PaliGemma and Image Annotation Resources

Explore these resources to enhance your understanding of PaliGemma and image annotation techniques.

Import data into Roboflow Annotate

First, install the supervision pip package:

pip install supervision


Once you have installed supervision, you are ready to load your data and start writing logic to filter detections.

Load Data

First, we are going to load our dataset into a supervision.Detections() object. We will then be able to use the coordinates in the Detections object to crop instances of each detection from our original image.



You can load data using the following code:


import supervision as sv

paligemma_result = "..."
detections = sv.Detections.from_lmm(
    sv.LMM.PALIGEMMA,
    paligemma_result,
    resolution_wh=(1000, 1000),
    classes=['cat', 'dog']
)
detections.xyxy
# array([[250., 250., 750., 750.]])

detections.class_id
# array([0])

Replace the ... with the response object from your model.

Crop Detections

You can crop detections using the supervision ImageSink and crop_image methods.

Use the code below to crop detections:


with sv.ImageSink(target_dir_path='target/directory/path') as sink:
    for xyxy in detections.xyxy:
        cropped_image = sv.crop_image(image=image, xyxy=xyxy)
        sink.save_image(image=cropped_image)

This code will crop all instances of each detection and save them as a new image. You can also process the predictions in memory by manipulating the cropped_image object.

Next Steps

supervision provides an extensive range of functionalities for working with computer vision models. With supervision, you can:

1. Process and filter detections and segmentation masks from a range of popular models (YOLOv5, Ultralytics YOLOv8, MMDetection, and more).
2. Process and filter classifications.
3. Compute confusion matrices.

And more! To learn about the full range of functionality in supervision, check out the supervision documentation.