Products
Platform
Universe
Open source computer vision datasets and pre-trained models
Annotate
Label images fast with AI-assisted data annotation
Train
Hosted model training infrastructure and GPU access
Workflows
Low-code interface to build pipelines and applications
Deploy
Run models on device, at the edge, in your VPC, or via API
Solutions
By Industry
Aerospace & Defence
Agriculture
Automotive
Banking & Finance
Government
Healthcare & Medicine
Manufacturing
Oil & Gas
Retail & Ecommerce
Safety & Security
Telecommunications
Transportation
Utilities
Developers
Resources
Documentation
User Forum
Computer Vision Models
Blog
Convert Annotation Formats
Learn Computer Vision
Inference Templates
Weekly Product Webinar
Pricing
Docs
Blog
Sign In
Get Started
Video Classification
Video classification models assign one or more labels to the contents of videos.
Computer Vision Models
Explore state-of-the-art computer vision model architectures, immediately usable for training with your custom dataset.
Filter Models
Search Models
Filter By Task
All Models
Object Detection
Classification
Instance Segmentation
Semantic Segmentation
Keypoint Detection
Vision-Language
OCR
Pose Estimation
Chart Question Answering
Document Question Answering (DocQA)
Video Classification
Open Vocabulary Object Detection
Multi-Label Classification
Region Proposal
Phrase Grounding
Referring Expression Segmentation
Zero Shot Segmentation
Filter By Feature
Foundation Vision
Multimodal Vision
LLMS with Vision Capabilities
Image Embedding
Real-Time Vision
Zero-shot Detection
Image Captioning
Image Similarity
Image Tagging
Visual Question Answering
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Apply
Deploy select models (i.e. YOLOv8, CLIP) using the Roboflow Hosted API, or your own hardware using
Roboflow Inference
.
Showing
of
models.
Segment Anything Model (SAM)
Segment Anything (SAM) is an image segmentation model developed by Meta Research, capable of doing zero-shot segmentation.
Instance Segmentation
Deploy with Roboflow
YOLOv8 Pose Estimation
The YOLOv8 pose estimation model allows you to detect keypoints in an image.
Pose Estimation
Deploy with Roboflow
YOLOv8
YOLOv8 is a state-of-the-art object detection and image segmentation model created by Ultralytics, the developers of YOLOv5.
Object Detection
Deploy with Roboflow
YOLOv8 Instance Segmentation
The state-of-the-art YOLOv8 model comes with support for instance segmentation tasks.
Instance Segmentation
Deploy with Roboflow
YOLOv9
YOLOv9 is an object detection model architecture released on February 21st, 2024.
Object Detection
Deploy with Roboflow
GroundingDINO
Grounding DINO is a zero-shot object detection model made by combining a Transformer-based DINO detector and grounded pre-training.
Object Detection
Deploy with Roboflow
Segment Anything 2
Segment Anything 2 (SAM 2) is a real-time image and video segmentation model.
Instance Segmentation
Deploy with Roboflow
YOLO-World
YOLO-World is a zero-shot object detection model.
Object Detection
Deploy with Roboflow
PaliGemma
PaliGemma is a vision language model (VLM) by Google that has multimodal capabilities.
Vision-Language
Deploy with Roboflow
GPT-4o
GPT-4o is OpenAI’s third major iteration of GPT-4 expanding on the capabilities of GPT-4 with Vision
Vision-Language
Deploy with Roboflow
Tesseract
Tesseract is a highly popular OCR engine and project, now primarily developed open-source.
OCR
Deploy with Roboflow
YOLOv5 Instance Segmentation
YOLOv5 Instance Segmentation is a version of YOLOv5 that can be used for instance segmentation tasks.
Instance Segmentation
Deploy with Roboflow
YOLOv5 Classification
YOLOv5 Classification is a version of the YOLOv5 model used in single-label and multi-label image classification.
Classification
Deploy with Roboflow
YOLOv5
A very fast and easy to use PyTorch model that achieves state of the art (or near state of the art) results.
Object Detection
Deploy with Roboflow
YOLO11
YOLO11 is a computer vision model that you can use for object detection, segmentation, and classification.
Object Detection
Deploy with Roboflow
Detectron2
Detectron2 is model zoo of it's own for computer vision models written in PyTorch.
Object Detection
Deploy with Roboflow
MediaPipe
Object Detection
Deploy with Roboflow
YOLOv8 Oriented Bounding Boxes
You can retrieve bounding boxes whose edges match an angled object by training an oriented bounding boxes object detection model, such as YOLOv8's Oriented Bounding Boxes model.
Object Detection
Deploy with Roboflow
Mask RCNN
Mask RCNN is a convolutional neural network for instance segmentation.
Instance Segmentation
Deploy with Roboflow
OpenAI CLIP
CLIP (Contrastive Language-Image Pre-Training) is an impressive multimodal zero-shot image classifier that achieves impressive results in a wide range of domains with no fine-tuning. It applies the recent advancements in large-scale transformers like GPT-3 to the vision arena.
Video Classification
Deploy with Roboflow
YOLOv8 Classification
An image classification model built using YOLOv8.
Classification
Deploy with Roboflow
EasyOCR
OCR
Deploy with Roboflow
LLaVA-1.5
LLaVA is an open source multimodal language model that you can use for visual question answering and has limited support for object detection.
Object Detection
Deploy with Roboflow
Grounded SAM
GroundedSAM combines Grounding DINO with the Segment Anything Model to identify and segment objects in an image given text captions.
Zero Shot Segmentation
Deploy with Roboflow
DETR
Detection Transformer (DETR) is an end-to-end object detection model implemented using the Transformer architecture.
Object Detection
Deploy with Roboflow
YOLOv7
YOLOv7 is a state of the art object detection model.
Object Detection
Deploy with Roboflow
YOLOv7 Instance Segmentation
YOLOv7 Instance Segmentation lets you perform segmentation tasks with the YOLOv7 model.
Instance Segmentation
Deploy with Roboflow
Vision Transformer
The Vision Transformer leverages powerful natural language processing embeddings (BERT) and applies them to images.
Classification
Deploy with Roboflow
YOLOX
YOLOX is a high-performance object detection model.
Object Detection
Deploy with Roboflow
YOLOv10
YOLOv10 is a real-time object detection model introduced in the paper "YOLOv10: Real-Time End-to-End Object Detection".
Object Detection
Deploy with Roboflow
Faster R-CNN
One of the most accurate object detection algorithms but requires a lot of power at inference time. A good choice if you can do processing asynchronously on a server.
Object Detection
Deploy with Roboflow
EfficientNet
EfficientNet is from a family of image classification models from GoogleAI that train comparatively quickly on small amounts of data, making the most of limited datasets.
Classification
Deploy with Roboflow
YOLOv3 PyTorch
Though it is no longer the most accurate object detection algorithm, YOLO v3 is still a very good choice when you need real-time detection while maintaining excellent accuracy. PyTorch version.
Object Detection
Deploy with Roboflow
YOLOv3 Keras
Though it is no longer the most accurate object detection algorithm, YOLO v3 is still a very good choice when you need real-time detection while maintaining excellent accuracy. Keras implementation.
Object Detection
Deploy with Roboflow
FastSAM
FastSAM is an image segmentation model trained using 2% of the data in the Segment Anything Model SA-1B dataset.
Instance Segmentation
Deploy with Roboflow
MT-YOLOv6
MT-YOLOv6 is a YOLO based model released in 2022.
Object Detection
Deploy with Roboflow
Surya
Surya is a Python package designed for OCR on document layout analysis.
OCR
Deploy with Roboflow
YOLACT
A simple, fully convolutional model for real-time instance segmentation
Instance Segmentation
Deploy with Roboflow
CogVLM
CogVLM shows strong performance in Visual Question Answering (VQA) and other vision tasks.
Vision-Language
Deploy with Roboflow
YOLOv4 PyTorch
YOLOv4 has emerged as the best real time object detection model. YOLOv4 carries forward many of the research contributions of the YOLO family of models along with new modeling and data augmentation techniques. This implementation is in PyTorch.
Object Detection
Deploy with Roboflow
YOLO-NAS
YOLO-NAS is an object detection model developed by Deci that achieves SOTA performances compared to YOLOv5, v7, and v8.
Object Detection
Deploy with Roboflow
YOLO-NAS Pose
YOLO-NAS Pose is a keypoint detection model developed by Deci AI.
Keypoint Detection
Deploy with Roboflow
ByteTrack
ByteTrack is a multi-object tracking computer vision algorithm.
Object Detection
Deploy with Roboflow
MMOCR
MMOCR is an Optical Character Recognition model zoo implemented with the MMDetection package.
OCR
Deploy with Roboflow
QwenVL
Qwen-VL is an LMM developed by Alibaba Cloud. Qwen-VL accepts images, text, and bounding boxes as inputs. The model can output text and bounding boxes. Qwen-VL naturally supports English, Chinese, and multilingual conversation.
Vision-Language
Deploy with Roboflow
DocTR
DocTR is an Optical Character Recognition tool powered by deep learning.
Object Detection
Deploy with Roboflow
SegFormer
SegFormer is a computer vision framework used in semantic segmentation tasks, implemented with transformers.
Semantic Segmentation
Deploy with Roboflow
Scaled YOLOv4
Scaled YOLOv4 is an extension of the YOLOv4 research implemented in the YOLOv5 PyTorch framework.
Object Detection
Deploy with Roboflow
YOLOR
YOLOR (You Only Learn One Representation) is an object detection model that uses both implicit and explicit knowledge to make predictions.
Object Detection
Deploy with Roboflow
SigLIP
SigLIP is an image embedding model defined in the "Sigmoid Loss for Language Image Pre-Training" paper.
Classification
Deploy with Roboflow
DETIC
Detic is an open source segmentation model developed by Meta Research and released in 2022.
Instance Segmentation
Deploy with Roboflow
YOLOv5 Oriented Bounding Boxes
YOLOv5-OBB is a variant of YOLOv5 that supports oriented bounding boxes. This model is designed to yield predictions that better fit objects that are positioned at an angle.
Object Detection
Deploy with Roboflow
OneFormer
OneFormer is a state-of-the-art multi-task image segmentation framework that is implemented using transformers.
Instance Segmentation
Deploy with Roboflow
MetaCLIP
MetaCLIP is a zero-shot classification and embedding model developed by Meta AI.
Classification
Deploy with Roboflow
YOLOS
YOLOS looks at patches of an image to to form "patch tokens", which are used in place of the traditional wordpiece tokens in NLP.
Object Detection
Deploy with Roboflow
4M
The 4M model is a versatile multimodal Transformer model developed by EPFL and Apple, capable of handling a handful of vision and language tasks.
Object Detection
Deploy with Roboflow
BakLLaVA
BakLLaVA is an LMM developed by LAION, Ontocord, and Skunkworks AI. BakLLaVA uses a Mistral 7B base augmented with the LLaVA 1.5 architecture.
Vision-Language
Deploy with Roboflow
L2CS-Net
L2CS-Net is a gaze estimation model that enables you to calculate where someone is looking and in what direction someone is looking.
Object Detection
Deploy with Roboflow
MobileNet SSD v2
This architecture provides good realtime results on limited compute. It's designed to run in realtime (30 frames per second) even on mobile devices.
Object Detection
Deploy with Roboflow
CoDet
CoDet is an open vocabulary zero-shot object detection model.
Object Detection
Deploy with Roboflow
ResNet 32
A fast, simple convolutional neural network that gets the job done for many tasks, including classification.
Classification
Deploy with Roboflow
Grounded EdgeSAM
Grounded EdgeSAM is a combination of Grounding DINO, a zero-shot object detection model, and EdgeSAM, a fast zero-shot image segmentation model.
Zero Shot Segmentation
Deploy with Roboflow
SAM-CLIP
Use Grounding DINO, Segment Anything, and CLIP to label objects in images.
Instance Segmentation
Deploy with Roboflow
MobileNet V2 Classification
MobileNet is a GoogleAI model well-suited for on-device, real-time classification (distinct from MobileNetSSD, Single Shot Detector). This implementation leverages transfer learning from ImageNet to your dataset.
Classification
Deploy with Roboflow
ResNet 34
A fast, simple convolutional neural network that gets the job done for many tasks, including classification.
Classification
Deploy with Roboflow
YOLOv4 Darknet
YOLOv4 has emerged as the best real time object detection model. YOLOv4 carries forward many of the research contributions of the YOLO family of models along with new modeling and data augmentation techniques. This implementation is in Darknet.
Object Detection
Deploy with Roboflow
EfficientDet (D7) Tensorflow 2
A scalable, state of the art object detection model, implemented here within the TensorFlow 2 Object Detection API.
Object Detection
Deploy with Roboflow
EfficientDet
EfficientDet achieves the best performance in the fewest training epochs among object detection model architectures, making it a highly scalable architecture especially when operating with limited compute.
Object Detection
Deploy with Roboflow
YOLOv4 Tiny
The tiny and fast version of YOLOv4 - good for training and deployment on limited compute resources, and getting a feel for your dataset
Object Detection
Deploy with Roboflow
RTMDet
RTMDet is an efficient real-time object detector, with self-reported metrics outperforming the YOLO series. It achieves 52.8% AP on COCO with 300+ FPS on an NVIDIA 3090 GPU, making it one of the fastest and most accurate object detectors available as of writing this post.
Object Detection
Deploy with Roboflow
DINOv2
DINOv2 is a self-supervised method for training computer vision models developed by Meta Research and released in April 2023.
Object Detection
Deploy with Roboflow
Kosmos-2
Kosmos-2 is a multimodal language model capable of object detection and grounding text in images.
Object Detection
Deploy with Roboflow
OWLv2
OWLv2 is a transformer-based object detection model developed by Google Research. OWLv2 is the successor to OWL ViT.
Object Detection
Deploy with Roboflow
FastViT
FastViT is a fast image classification model developed by Apple.
Classification
Deploy with Roboflow
OWL ViT
OWL-ViT is a transformer-based object detection model developed by Google Research.
Object Detection
Deploy with Roboflow
ALBEF
Classification
Deploy with Roboflow
BLIPv2
BLIPv2 is a multimodal model developed by Salesforce Research.
Classification
Deploy with Roboflow
BLIP
Classification
Deploy with Roboflow
GPT-4 with Vision
GPT-4 with Vision is a multimodal language model developed by OpenAI.
Object Detection
Deploy with Roboflow
VLPart
VLPart, developed by Meta Research, is an object detection and segmentation model that works with an open vocabulary
Object Detection
Deploy with Roboflow
MobileCLIP
MobileCLIP is an image embedding model developed by Apple and introduced in the "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training" paper
Classification
Deploy with Roboflow
BioCLIP
BioCLIP is a Vision Foundation Model for the Tree of Life
Classification
Deploy with Roboflow
RemoteCLIP
RemoteCLIP is a zero-shot classification model for remote sensing.
Classification
Deploy with Roboflow
AltCLIP
AltCLIP is a zero-shot image classification model.
Classification
Deploy with Roboflow
Anthropic Claude 3
Vision-Language
Deploy with Roboflow
ResNet-50
Classification
Deploy with Roboflow
Google Gemini
Gemini is a family of Large Multimodal Models (LMMs) developed by Google Deepmind focused specifically on multimodality.
Vision-Language
Deploy with Roboflow
TrOCR
TrOCR is a Transformer-based OCR model developed by researchers from Microsoft Research.
OCR
Deploy with Roboflow
PaliGemma Image Captioning
You can use the set of PaliGemma weights trained on the COCO Captions dataset for zero-shot image captioning.
Deploy with Roboflow
PaliGemma Website Understanding
You can use the set of PaliGemma weights trained on the Screen2Words dataset for asking questions about website screenshots.
Deploy with Roboflow
PaliGemma VQA
You can use the set of PaliGemma weights trained on the VQAv2 dataset for asking questions about the contents of images.
Deploy with Roboflow
PaliGemma Document VQA
You can use the set of PaliGemma weights trained on the DocVQA dataset for asking questions about documents.
Document Question Answering (DocQA)
Deploy with Roboflow
PaliGemma Optical Character Recognition
You can use the set of PaliGemma weights trained on the OCRVQA dataset for performing OCR on images.
Deploy with Roboflow
Florence 2
Florence-2 is a lightweight vision-language model open-sourced by Microsoft under the MIT license.
Open Vocabulary Object Detection
Deploy with Roboflow
Cambrian
Deploy with Roboflow
RT-DETR
Object Detection
Deploy with Roboflow
Phi-3.5
Vision-Language
Deploy with Roboflow
Florence 2 Object Detection
Deploy with Roboflow
Florence 2 Image Segmentation
Referring Expression Segmentation
Deploy with Roboflow
Florence 2 OCR
Florence-2 OCR is a subset of Florence-2 that can read characters in images.
Deploy with Roboflow
Visual Question Answering
Image Tagging
Image Similarity
Image Captioning
Zero-shot Detection
Real-Time Vision
Image Embedding
LLMS with Vision Capabilities
Multimodal Vision
Foundation Vision
Filter Models
Search Models
Filter By Task
All Models
Object Detection
Classification
Instance Segmentation
Semantic Segmentation
Keypoint Detection
Vision-Language
OCR
Pose Estimation
Chart Question Answering
Document Question Answering (DocQA)
Video Classification
Open Vocabulary Object Detection
Multi-Label Classification
Region Proposal
Phrase Grounding
Referring Expression Segmentation
Zero Shot Segmentation
Filter By Feature
Foundation Vision
Multimodal Vision
LLMS with Vision Capabilities
Image Embedding
Real-Time Vision
Zero-shot Detection
Image Captioning
Image Similarity
Image Tagging
Visual Question Answering
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Apply
Deploy select models (i.e. YOLOv8, CLIP) using the Roboflow Hosted API, or your own hardware using
Roboflow Inference
.
Showing
of
models.
OpenAI CLIP
CLIP (Contrastive Language-Image Pre-Training) is an impressive multimodal zero-shot image classifier that achieves impressive results in a wide range of domains with no fine-tuning. It applies the recent advancements in large-scale transformers like GPT-3 to the vision arena.
Video Classification
Deploy with Roboflow
Visual Question Answering
Image Tagging
Image Similarity
Image Captioning
Zero-shot Detection
Real-Time Vision
Image Embedding
LLMS with Vision Capabilities
Multimodal Vision
Foundation Vision
Frequently Asked Questions
No items found.
Where Can I Learn More About Object Detection?
View All Learning Resources
No items found.