Top Real Time Vision Models

Explore models that run in real-time (or close to real-time).
‍
Deploy select models (i.e. YOLOv8, CLIP) using the Roboflow Hosted API, or your own hardware using Roboflow Inference.

Object Detection

Deploy on Device with Roboflow✅

Object Detection

SAM 3D Objects

Model Size:

Parameters:

Architecture:

Segment Anything

SAM 3D Objects is a 3D reconstruction model. Learn more »

Instance Segmentation

Deploy on Device with Roboflow✅

Instance Segmentation

Segment Anything 3

Model Size:

Parameters:

Architecture:

Segment Anything

Segment Anything 3 (SAM 3) is an image segmentation model released by Meta. Learn more »

Instance Segmentation

Deploy on Device with Roboflow✅

Instance Segmentation

RF-DETR Segmentation

Model Size:

Parameters:

Architecture:

DETR

RF-DETR Segmentation is a state-of-the-art image segmentation model that you can fine-tune on your own data. Learn more »

Multimodal Model

Deploy on Device with Roboflow✅

Multimodal Model

GPT-5

Model Size:

Parameters:

Architecture:

GPT-5 is a multimodal language model developed by OpenAI. Learn more »

Object Detection

Deploy on Device with Roboflow✅

Object Detection

Depth Anything V2

Model Size:

Parameters:

Architecture:

Depth-Anything-V2 is a depth estimation model developed by researchers from HKU and TikTok. Learn more »

Multimodal Model

Deploy on Device with Roboflow✅

Multimodal Model

GPT-4.1

Model Size:

Parameters:

Architecture:

GPT

GPT-4.1 is a multimodal model developed by OpenAI that comes in three sizes: GPT-4.1, mini, and nano. Learn more »

Object Detection

Deploy on Device with Roboflow✅

Object Detection

RF-DETR

Model Size:

Parameters:

Architecture:

Transformers

RF-DETR is a SOTA, real-time object detection model architecture developed by Roboflow and released under the Apache 2.0 license. Learn more »

Multimodal Model

Deploy on Device with Roboflow✅

Multimodal Model

Claude 3.7 Sonnet

Model Size:

Parameters:

Architecture:

Claude 3.7 is a multimodal "hybrid reasoning" model developed by Anthropic. Learn more »

Multimodal Model

Deploy on Device with Roboflow✅

Multimodal Model

Phi-4 Multimodal

Model Size:

Parameters:

Architecture:

Phi-4 Multimodal is a multimodal language model developed by Microsoft. Learn more »

Object Detection

Deploy on Device with Roboflow✅

Object Detection

Co-DETR

Model Size:

Parameters:

Architecture:

Transformers

Co-Deformable-DETR (Co-DETR) is an object detection model architecture introduced in the paper "DETRs with Collaborative Hybrid Assignments Training". Learn more »

Object Detection

Deploy on Device with Roboflow✅

Object Detection

D-FINE

Model Size:

Parameters:

Architecture:

Transformers

D-FINE is a real-time object detection model introduced in the paper " D-FINE: Redefine Regression Task of DETRs as Fine‑grained Distribution Refinement". Learn more »

Object Detection

Deploy on Device with Roboflow✅

Object Detection

DEIM

Model Size:

Parameters:

Architecture:

DEIM is a training framework for DETR models. The framework strives to enable "faster convergence and improved accuracy" in models. Learn more »

Object Detection

Deploy on Device with Roboflow✅

Object Detection

YOLOE

Model Size:

Parameters:

Architecture:

YOLO

YOLOE is a new object detection and segmentation model developed by the creators of YOLOv10. Learn more »

Multimodal Model

Deploy on Device with Roboflow✅

Multimodal Model

SmolVLM2

Model Size:

Parameters:

Architecture:

Transformers

SmolVLM2 is a multimodal image and video understanding model developed by engineers on the Hugging Face TB (Textbook) Research team. Learn more »

Object Detection

Deploy on Device with Roboflow✅

Object Detection

Moondream 2

Model Size:

Parameters:

Architecture:

Moondream 2 is the latest model in the Moondream series of “tiny vision language models”. Learn more »

Multimodal Model

Deploy on Device with Roboflow✅

Multimodal Model

Gemma 3

Model Size:

Parameters:

Architecture:

Transformers

Gemma 3 is a multimodal language model developed by Google. Learn more »

Multimodal Model

Deploy on Device with Roboflow✅

Multimodal Model

OpenAI o3-mini

Model Size:

Parameters:

Architecture:

GPT

OpenAI o3-mini is a multimodal reasoning model developed by OpenAI. Learn more »

Multimodal Model

Deploy on Device with Roboflow✅

Multimodal Model

Qwen2.5-VL

Model Size:

Parameters:

Architecture:

Transformers

Qwen2.5-VL is a multimodal vision-language model developed by the Qwen team at Alibaba Cloud. Learn more »

Object Detection

Deploy on Device with Roboflow✅

Object Detection

YOLOv12

Model Size:

Parameters:

Architecture:

YOLO

YOLOv12 is a state-of-the-art computer vision model you can use for detection, segmentation, and more. Learn more »

Multimodal Model

Deploy on Device with Roboflow✅

Multimodal Model

PaliGemma-2

Model Size:

Parameters:

Architecture:

PaliGemma-2 is a multimodal model developed by Google. Learn more »

Object Detection

Deploy on Device with Roboflow✅

Object Detection

YOLO11

Model Size:

Parameters:

Architecture:

YOLO11 is a computer vision model that you can use for object detection, segmentation, and classification. Learn more »

Instance Segmentation

Deploy on Device with Roboflow✅

Instance Segmentation

YOLOv9 Image Segmentation

Model Size:

Parameters:

Architecture:

Multimodal Model

Deploy on Device with Roboflow✅

Multimodal Model

Florence 2 Image Captioning

Model Size:

Parameters:

Architecture:

Florence-2 Image Captioning is a subset of Florence-2 that supports describing images with text. Learn more »

Optical Character Recognition

Deploy on Device with Roboflow✅

Optical Character Recognition

Florence 2 OCR

Model Size:

Parameters:

Architecture:

Florence-2 OCR is a subset of Florence-2 that can read characters in images. Learn more »

Instance Segmentation

Deploy on Device with Roboflow✅

Instance Segmentation

Florence 2 Image Segmentation

Model Size:

Parameters:

Architecture:

Object Detection

Deploy on Device with Roboflow✅

Object Detection

Florence 2 Object Detection

Model Size:

Parameters:

Architecture:

Florence-2 is a lightweight vision-language model open-sourced by Microsoft under the MIT license. Learn more »

Multimodal Model

Deploy on Device with Roboflow✅

Multimodal Model

Phi-3.5

Model Size:

Parameters:

6600000000.0

Architecture:

Instance Segmentation

Deploy on Device with Roboflow✅

Instance Segmentation

Segment Anything 2

Model Size:

Parameters:

Architecture:

Segment Anything

Segment Anything 2 (SAM 2) is a real-time image and video segmentation model. Learn more »

Keypoint Detection

Deploy on Device with Roboflow✅

Keypoint Detection

MediaPipe

Model Size:

Parameters:

Architecture:

Object Detection

Deploy on Device with Roboflow✅

Object Detection

RT-DETR

Model Size:

Parameters:

76000000

Architecture:

DETR

Multimodal Model

Deploy on Device with Roboflow✅

Multimodal Model

Cambrian

Model Size:

Parameters:

Architecture:

Multimodal Model

Deploy on Device with Roboflow✅

Multimodal Model

Model Size:

2800000000.0

Parameters:

705000000.0

Architecture:

The 4M model is a versatile multimodal Transformer model developed by EPFL and Apple, capable of handling a handful of vision and language tasks. Learn more »

Object Detection

Deploy on Device with Roboflow✅

Object Detection

Florence 2

Model Size:

770000000.0

Parameters:

770000000

Architecture:

Florence-2 is a lightweight vision-language model open-sourced by Microsoft under the MIT license. Learn more »

Object Detection

Deploy on Device with Roboflow✅

Object Detection

PaliGemma Optical Character Recognition

Model Size:

Parameters:

Architecture:

You can use the set of PaliGemma weights trained on the OCRVQA dataset for performing OCR on images. Learn more »

Object Detection

Deploy on Device with Roboflow✅

Object Detection

PaliGemma Document VQA

Model Size:

Parameters:

Architecture:

You can use the set of PaliGemma weights trained on the DocVQA dataset for asking questions about documents. Learn more »

Object Detection

Deploy on Device with Roboflow✅

Object Detection

PaliGemma VQA

Model Size:

Parameters:

Architecture:

You can use the set of PaliGemma weights trained on the VQAv2 dataset for asking questions about the contents of images. Learn more »

Object Detection

Deploy on Device with Roboflow✅

Object Detection

PaliGemma Website Understanding

Model Size:

Parameters:

Architecture:

You can use the set of PaliGemma weights trained on the Screen2Words dataset for asking questions about website screenshots. Learn more »

Object Detection

Deploy on Device with Roboflow✅

Object Detection

PaliGemma Image Captioning

Model Size:

Parameters:

Architecture:

You can use the set of PaliGemma weights trained on the COCO Captions dataset for zero-shot image captioning. Learn more »

Object Detection

Deploy on Device with Roboflow✅

Object Detection

YOLOv10

Model Size:

Parameters:

29500000

Architecture:

YOLO

YOLOv10 is a real-time object detection model introduced in the paper "YOLOv10: Real-Time End-to-End Object Detection". Learn more »

Multimodal Model

Deploy on Device with Roboflow✅

Multimodal Model

GPT-4o

Model Size:

Parameters:

Architecture:

GPT-4o is OpenAI’s third major iteration of GPT-4 expanding on the capabilities of GPT-4 with Vision Learn more »

Multimodal Model

Deploy on Device with Roboflow✅

Multimodal Model

PaliGemma

Model Size:

4000000000.0

Parameters:

3 Billion

Architecture:

PaliGemma is a vision language model (VLM) by Google that has multimodal capabilities. Learn more »

Object Detection

Deploy on Device with Roboflow✅

Object Detection

MMOCR

Model Size:

Parameters:

Architecture:

MMOCR is an Optical Character Recognition model zoo implemented with the MMDetection package. Learn more »

Object Detection

Deploy on Device with Roboflow✅

Object Detection

TrOCR

Model Size:

Parameters:

Architecture:

TrOCR is a Transformer-based OCR model developed by researchers from Microsoft Research. Learn more »

Multimodal Model

Deploy on Device with Roboflow✅

Multimodal Model

Tesseract

Model Size:

Parameters:

Architecture:

Tesseract is a highly popular OCR engine and project, now primarily developed open-source. Learn more »

Multimodal Model

Deploy on Device with Roboflow✅

Multimodal Model

Surya

Model Size:

Parameters:

Architecture:

Surya is a Python package designed for OCR on document layout analysis. Learn more »

Multimodal Model

Deploy on Device with Roboflow✅

Multimodal Model

Google Gemini

Model Size:

Parameters:

Architecture:

Gemini is a family of Large Multimodal Models (LMMs) developed by Google Deepmind focused specifically on multimodality. Learn more »

Classification

Deploy on Device with Roboflow✅

Classification

ResNet-50

Model Size:

Parameters:

25600000

Architecture:

Residual Neural Networks

ResNet-50 is a popular image classification model architecture. Learn more »

Multimodal Model

Deploy on Device with Roboflow✅

Multimodal Model

Anthropic Claude 3

Model Size:

Parameters:

Architecture:

Multimodal Model

Deploy on Device with Roboflow✅

Multimodal Model

EasyOCR

Model Size:

Parameters:

50000000

Architecture:

Object Detection

Deploy on Device with Roboflow✅

Object Detection

YOLOv8 Oriented Bounding Boxes

Model Size:

69500000.0

Parameters:

69500000

Architecture:

YOLO

You can retrieve bounding boxes whose edges match an angled object by training an oriented bounding boxes object detection model, such as YOLOv8's Oriented Bounding Boxes model. Learn more »

Classification

Deploy on Device with Roboflow✅

Classification

AltCLIP

Model Size:

Parameters:

Architecture:

AltCLIP is a zero-shot image classification model. Learn more »

Classification

Deploy on Device with Roboflow✅

Classification

RemoteCLIP

Model Size:

Parameters:

Architecture:

RemoteCLIP is a zero-shot classification model for remote sensing. Learn more »

Classification

Deploy on Device with Roboflow✅

Classification

BioCLIP

Model Size:

Parameters:

Architecture:

BioCLIP is a Vision Foundation Model for the Tree of Life Learn more »

Classification

Deploy on Device with Roboflow✅

Classification

MobileCLIP

Model Size:

Parameters:

Architecture:

MobileCLIP is an image embedding model developed by Apple and introduced in the "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training" paper Learn more »

Classification

Deploy on Device with Roboflow✅

Classification

SigLIP

Model Size:

Parameters:

878000000

Architecture:

SigLIP is an image embedding model defined in the "Sigmoid Loss for Language Image Pre-Training" paper. Learn more »

Object Detection

Deploy on Device with Roboflow✅

Object Detection

YOLOv9

Model Size:

Parameters:

57300000

Architecture:

YOLO

YOLOv9 is an object detection model architecture released on February 21st, 2024. Learn more »

Object Detection

Deploy on Device with Roboflow✅

Object Detection

YOLO-World

Model Size:

Parameters:

69000000

Architecture:

YOLO

YOLO-World is a zero-shot object detection model. Learn more »

Keypoint Detection

Deploy on Device with Roboflow✅

Keypoint Detection

YOLO-NAS Pose

Model Size:

Parameters:

Architecture:

YOLO

YOLO-NAS Pose is a keypoint detection model developed by Deci AI. Learn more »

Keypoint Detection

Deploy on Device with Roboflow✅

Keypoint Detection

YOLOv8 Pose Estimation

Model Size:

Parameters:

Architecture:

YOLO

The YOLOv8 pose estimation model allows you to detect keypoints in an image. Learn more »

Instance Segmentation

Deploy on Device with Roboflow✅

Instance Segmentation

Grounded EdgeSAM

Model Size:

Parameters:

Architecture:

Grounded EdgeSAM is a combination of Grounding DINO, a zero-shot object detection model, and EdgeSAM, a fast zero-shot image segmentation model. Learn more »

Multimodal Model

Deploy on Device with Roboflow✅

Multimodal Model

BakLLaVA

Model Size:

Parameters:

13000000000

Architecture:

BakLLaVA is an LMM developed by LAION, Ontocord, and Skunkworks AI. BakLLaVA uses a Mistral 7B base augmented with the LLaVA 1.5 architecture. Learn more »

Multimodal Model

Deploy on Device with Roboflow✅

Multimodal Model

CogVLM

Model Size:

Parameters:

6500000000

Architecture:

CogVLM shows strong performance in Visual Question Answering (VQA) and other vision tasks. Learn more »

Multimodal Model

Deploy on Device with Roboflow✅

Multimodal Model

QwenVL

Model Size:

7000000000.0

Parameters:

Architecture:

Qwen-VL is an LMM developed by Alibaba Cloud. Qwen-VL accepts images, text, and bounding boxes as inputs. The model can output text and bounding boxes. Qwen-VL naturally supports English, Chinese, and multilingual conversation. Learn more »

Object Detection

Deploy on Device with Roboflow✅

Object Detection

VLPart

Model Size:

Parameters:

Architecture:

VLPart, developed by Meta Research, is an object detection and segmentation model that works with an open vocabulary Learn more »

Object Detection

Deploy on Device with Roboflow✅

Object Detection

CoDet

Model Size:

Parameters:

Architecture:

CoDet is an open vocabulary zero-shot object detection model. Learn more »

Object Detection

Deploy on Device with Roboflow✅

Object Detection

GPT-4 with Vision

Model Size:

Parameters:

Architecture:

Transformer

GPT-4 with Vision is a multimodal language model developed by OpenAI. Learn more »

Object Detection

Deploy on Device with Roboflow✅

Object Detection

Grounding DINO

Model Size:

Parameters:

Architecture:

Grounding DINO is a state-of-the-art zero-shot object detection model, developed by IDEA Research. Learn more »

Classification

Deploy on Device with Roboflow✅

Classification

BLIP

Model Size:

Parameters:

Architecture:

Instance Segmentation

Deploy on Device with Roboflow✅

Instance Segmentation

Grounded SAM

Model Size:

Parameters:

Architecture:

Combination of Grounding DINO and Segment Anything

GroundedSAM combines Grounding DINO with the Segment Anything Model to identify and segment objects in an image given text captions. Learn more »

Instance Segmentation

Deploy on Device with Roboflow✅

Instance Segmentation

SAM-CLIP

Model Size:

Parameters:

Architecture:

Combination of Segment Anything and CLIP

Use Grounding DINO, Segment Anything, and CLIP to label objects in images. Learn more »

Classification

Deploy on Device with Roboflow✅

Classification

BLIPv2

Model Size:

Parameters:

Architecture:

BLIPv2 is a multimodal model developed by Salesforce Research. Learn more »

Classification

Deploy on Device with Roboflow✅

Classification

ALBEF

Model Size:

Parameters:

Architecture:

Object Detection

Deploy on Device with Roboflow✅

Object Detection

OWL ViT

Model Size:

Parameters:

Architecture:

OWL-ViT is a transformer-based object detection model developed by Google Research. Learn more »

Classification

Deploy on Device with Roboflow✅

Classification

FastViT

Model Size:

Parameters:

Architecture:

FastViT is a fast image classification model developed by Apple. Learn more »

Classification

Deploy on Device with Roboflow✅

Classification

MetaCLIP

Model Size:

Parameters:

Architecture:

CLIP

MetaCLIP is a zero-shot classification and embedding model developed by Meta AI. Learn more »

Object Detection

Deploy on Device with Roboflow✅

Object Detection

OWLv2

Model Size:

Parameters:

Architecture:

OWLv2 is a transformer-based object detection model developed by Google Research. OWLv2 is the successor to OWL ViT. Learn more »

Object Detection

Deploy on Device with Roboflow✅

Object Detection

LLaVA-1.5

Model Size:

Parameters:

13000000000

Architecture:

LLaVA is an open source multimodal language model that you can use for visual question answering and has limited support for object detection. Learn more »

Object Detection

Deploy on Device with Roboflow✅

Object Detection

Kosmos-2

Model Size:

Parameters:

Architecture:

Kosmos-2 is a multimodal language model capable of object detection and grounding text in images. Learn more »

Object Detection

Deploy on Device with Roboflow✅

Object Detection

L2CS-Net

Model Size:

Parameters:

Architecture:

L2CS-Net is a gaze estimation model that enables you to calculate where someone is looking and in what direction someone is looking. Learn more »

Object Detection

Deploy on Device with Roboflow✅

Object Detection

DocTR

Model Size:

Parameters:

Architecture:

DocTR is an Optical Character Recognition tool powered by deep learning. Learn more »

Object Detection

Deploy on Device with Roboflow✅

Object Detection

DINOv2

Model Size:

Parameters:

Architecture:

DINOv2 is a self-supervised method for training computer vision models developed by Meta Research and released in April 2023. Learn more »

Object Detection

Deploy on Device with Roboflow✅

Object Detection

RTMDet

Model Size:

Parameters:

Architecture:

RTMDet is an efficient real-time object detector, with self-reported metrics outperforming the YOLO series. It achieves 52.8% AP on COCO with 300+ FPS on an NVIDIA 3090 GPU, making it one of the fastest and most accurate object detectors available as of writing this post. Learn more »

Instance Segmentation

Deploy on Device with Roboflow✅

Instance Segmentation

YOLACT

Model Size:

Parameters:

Architecture:

A simple, fully convolutional model for real-time instance segmentation Learn more »

Object Detection

Deploy on Device with Roboflow✅

Object Detection

ByteTrack

Model Size:

Parameters:

Architecture:

ByteTrack is a multi-object tracking computer vision algorithm. Learn more »

Instance Segmentation

Deploy on Device with Roboflow✅

Instance Segmentation

FastSAM

Model Size:

Parameters:

636000000

Architecture:

FastSAM is an image segmentation model trained using 2% of the data in the Segment Anything Model SA-1B dataset. Learn more »

Instance Segmentation

Deploy on Device with Roboflow✅

Instance Segmentation

DETIC

Model Size:

25000000.0

Parameters:

Architecture:

Detic is an open source segmentation model developed by Meta Research and released in 2022. Learn more »

Object Detection

Deploy on Device with Roboflow✅

Object Detection

YOLO-NAS

Model Size:

67000000.0

Parameters:

Architecture:

YOLO

YOLO-NAS is an object detection model developed by Deci that achieves SOTA performances compared to YOLOv5, v7, and v8. Learn more »

Instance Segmentation

Deploy on Device with Roboflow✅

Instance Segmentation

Segment Anything Model (SAM)

Model Size:

Parameters:

Architecture:

Segment Anything (SAM) is an image segmentation model developed by Meta Research, capable of doing zero-shot segmentation. Learn more »

Object Detection

Deploy on Device with Roboflow✅

Object Detection

GroundingDINO

Model Size:

Parameters:

Architecture:

Grounding DINO is a zero-shot object detection model made by combining a Transformer-based DINO detector and grounded pre-training. Learn more »

Object Detection

Deploy on Device with Roboflow✅

Object Detection

DETR

Model Size:

60000000.0

Parameters:

Architecture:

Transformers

Detection Transformer (DETR) is an end-to-end object detection model implemented using the Transformer architecture. Learn more »

Classification

Deploy on Device with Roboflow✅

Classification

YOLOv8 Classification

Model Size:

Parameters:

68200000

Architecture:

YOLO

An image classification model built using YOLOv8. Learn more »

Instance Segmentation

Deploy on Device with Roboflow✅

Instance Segmentation

YOLOv8 Instance Segmentation

Model Size:

Parameters:

68200000

Architecture:

YOLO

The state-of-the-art YOLOv8 model comes with support for instance segmentation tasks. Learn more »

Object Detection

Deploy on Device with Roboflow✅

Object Detection

YOLOv8

Model Size:

68200000.0

Parameters:

Architecture:

YOLO, CNN

YOLOv8 is a state-of-the-art object detection and image segmentation model created by Ultralytics, the developers of YOLOv5. Learn more »

Instance Segmentation

Deploy on Device with Roboflow✅

Instance Segmentation

YOLOv7 Instance Segmentation

Model Size:

Parameters:

150000000

Architecture:

YOLO

YOLOv7 Instance Segmentation lets you perform segmentation tasks with the YOLOv7 model. Learn more »

Instance Segmentation

Deploy on Device with Roboflow✅

Instance Segmentation

OneFormer

Model Size:

Parameters:

219 million

Architecture:

Transformers

OneFormer is a state-of-the-art multi-task image segmentation framework that is implemented using transformers. Learn more »

Classification

Deploy on Device with Roboflow✅

Classification

ResNet 32

Model Size:

Parameters:

460,000

Architecture:

A fast, simple convolutional neural network that gets the job done for many tasks, including classification. Learn more »

Object Detection

Deploy on Device with Roboflow✅

Object Detection

YOLOX

Model Size:

68.7

Parameters:

99.1 million parameters

Architecture:

CNN, YOLO

YOLOX is a high-performance object detection model. Learn more »

Object Detection

Deploy on Device with Roboflow✅

Object Detection

YOLOR

Model Size:

202.0

Parameters:

12,786,711 (S2D)

Architecture:

CNN, YOLO

YOLOR (You Only Learn One Representation) is an object detection model that uses both implicit and explicit knowledge to make predictions. Learn more »

Object Detection

Deploy on Device with Roboflow✅