How to create a DETR confusion matrix

Learn how to calculate confusion matrices for model results using the open source supervision Python package.

Overview

Evaluation is an essential part of the computer vision model development process. While you are striving to build the first version of your model, model evaluation will help you understand baseline performance and judge how close your model is to being ready for production. When working on future versions of a model, evaluation helps you understand the impact of each change you make.

One key part of evaluating models is computing confusion matrices. A confusion matrix is a visualization that shows how your model is performing on the classes on which it was trained.

In this guide, we are going to show you how to use the open source supervision Python package to create a

confusion matrix.

We will:

1. Install supervision
2. Run inference on a dataset using a

DETR

model
3. Create and plot a confusion matrix for the model

Without further ado, let's get started!

DETR and Image Annotation Resources

Explore these resources to enhance your understanding of DETR and image annotation techniques.

Install supervision

First, install the supervision pip package:

pip install supervision


Once you have installed supervision, you are ready to load your data and start writing logic to filter detections.

Load Data and Compute Matrix

First, we are going to load our dataset into a supervision.DetectionDataset() object. This object will contain information about all the images in a dataset. You can load datasets from many different model types, from YOLO to MMDetection. For this guide, we will use the

DETR

data loader.

model.

We will use that callback to run inference on every image in our dataset, and compute a confusion matrix that shows how the model performs on the dataset.

Create a new Python file and add the following code:


import torch
import supervision as sv
from PIL import Image
from transformers import DetrImageProcessor, DetrForObjectDetection

processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50")
model = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50")

dataset = sv.DetectionDataset.from_yolo(...)

def callback(image: np.ndarray) -> sv.Detections:
    inputs = processor(images=image, return_tensors="pt")

    with torch.no_grad():
        outputs = model(**inputs)

    width, height = image.size
    target_size = torch.tensor([[height, width]])
    results = processor.post_process_object_detection(
        outputs=outputs, target_sizes=target_size)[0]

    detections = sv.Detections.from_transformers(
        transformers_results=results,
        id2label=model.config.id2label
    )
    
    return detections

confusion_matrix = sv.ConfusionMatrix.benchmark(
   dataset = dataset,
   callback = callback
)

confusion_matrix.plot()

Set the DATASET value as the path to the folder where your dataset is stored.

Then, run the code to create the confusion matrix.

Plot Confusion Matrix

We can plot the confusion matrix showing the results of the

DETR

model evaluation using the following line of code


confusion_matrix.plot()

Next Steps

supervision provides an extensive range of functionalities for working with computer vision models. With supervision, you can:

1. Process and filter detections and segmentation masks from a range of popular models (YOLOv5, Ultralytics YOLOv8, MMDetection, and more).
2. Process and filter classifications.
3. Plot bounding boxes and segmentation masks.

And more! To learn about the full range of functionality in supervision, check out the supervision documentation.