Try the Model

Use the widget below to experiment with RT-DETR. You can detect COCO classes such as people, vehicles, animals, household items.

Overview

RT-DETR (Real-Time Detection Transformer) is an object detection model developed by Baidu that integrates a Transformer-based architecture to achieve high accuracy while maintaining real-time performance. Unlike conventional detection models that rely on separate object proposal and classification stages, RT-DETR employs an end-to-end design, streamlining the detection process for greater efficiency.

One of RT-DETR's key strengths is its ability to balance speed and accuracy, making it well-suited for applications that demand rapid processing, such as autonomous driving and video surveillance. By leveraging the Transformer architecture, the model effectively captures complex spatial relationships in images while ensuring low-latency inference. This combination of efficiency and precision makes RT-DETR an attractive option for developers and researchers working in real-time object detection scenarios.

Key features

Real-Time Performance: Achieves competitive inference speeds, making it ideal for time-sensitive applications.
End-to-End Architecture: Eliminates the need for post-processing steps like non-maximum suppression (NMS), streamlining the detection process.
Efficient Hybrid Encoder: Combines intra-scale feature interaction and cross-scale feature fusion to enhance detection capabilities across various object sizes.
IoU-aware Query Selection: Improves object query initialization by focusing on the most relevant objects, thus enhancing accuracy.

Technical specifications

Feature	Specification
Launch date	2024
Company	Baidu
License	Apache License 2.0
Model family	RT-DETR-S RT-DETR-M RT-DETR-L RT-DETR-X
Parameter size	RT-DETR-S: 20M RT-DETR-M: 36M RT-DETR-L: 45M RT-DETR-X: 86M
mAP (Mean Average Precision) COCO	RT-DETR-S: 48.1% RT-DETR-M: 51.9% RT-DETR-L: 53.0% RT-DETR-X: 54.8%
Inference speed (T4 GPU)	RT-DETR-S: 199 FPS RT-DETR-M: 133 FPS RT-DETR-L: 114 FPS RT-DETR-X: 74 FPS
Architecture type	Transformer-based
Unique features	Efficient Hybrid Encoder, IoU-aware Query Selection
Paper	RT-DETR Paper
Try this model	How to train RT-DETR on a custom dataset

RT-DETR License

RT-DETR

is licensed under a

license.

Performance

Based on COCO, RT-DETR beats out YOLOv8

Deploy a RT-DETR API

You can use Roboflow Inference to deploy a

RT-DETR

API on your hardware. You can deploy the model on CPU (i.e. Raspberry Pi, AI PCs) and GPU devices (i.e. NVIDIA Jetson, NVIDIA T4).

Below are instructions on how to deploy your own model API.

Label Data Automatically with RT-DETR

You can automatically label a dataset using RT-DETR with help from Autodistill, an open source package for training computer vision models. You can label a folder of images automatically with only a few lines of code. Below, see our tutorials that demonstrate how to use RT-DETR to train a computer vision model.

No items found.