Use the widget below to experiment with YOLOv12. You can detect COCO classes such as people, vehicles, animals, household items.
YOLOv12 is a newly proposed attention-centric variant of the YOLO family that focuses on incorporating efficient attention mechanisms into the backbone while preserving real-time performance. Instead of relying heavily on CNN-based architectures like its predecessors, YOLOv12 introduces a simple yet powerful “area attention” module, which strategically partitions the feature map to reduce the quadratic complexity of full self-attention. It also adopts residual efficient layer aggregation networks (R-ELAN) to enhance feature aggregation and training stability, especially for larger models. These innovations, together with refinements such as scaled residual connections and a reduced MLP ratio, enable YOLOv12 to harness the benefits of attention (e.g., better global context modeling) without sacrificing speed.
YOLOv12 achieves both lower latency than previous YOLO models and higher accuracy when validated on the Microsoft COCO dataset. See the object detection model leaderboard for more details.
You can use the model for:
You can see the model source code in the official model GitHub repository.
YOLOv12
is licensed under a
license.
You can use Roboflow Inference to deploy a
YOLOv12
API on your hardware. You can deploy the model on CPU (i.e. Raspberry Pi, AI PCs) and GPU devices (i.e. NVIDIA Jetson, NVIDIA T4).
Below are instructions on how to deploy your own model API.