Top Foundation Vision Models

Foundation models are large models that you can use without prior training. You can use foundation models to auto-label data for use in training a smaller, real-time vision model.
‍‍
If you're more interested in deploying a model without code, check out our Roboflow Deploy product.

Object Detection
Object Detection
Object Detection
Object Detection

Model Size:

MB

Parameters:

Top FPS:

Architecture:

CoDet is an open vocabulary zero-shot object detection model. Learn more »
Object Detection
Object Detection
Object Detection
Object Detection

Model Size:

MB

Parameters:

Top FPS:

Architecture:

Object Detection
Object Detection
Object Detection
Object Detection

Model Size:

MB

Parameters:

Top FPS:

Architecture:

Grounding DINO is a state-of-the-art zero-shot object detection model, developed by IDEA Research. Learn more »
Object Detection
Object Detection
Object Detection
Object Detection

Model Size:

MB

Parameters:

Top FPS:

Architecture:

OWL-ViT is a transformer-based object detection model developed by Google Research. Learn more »
Classification
Classification
Classification
Classification

Model Size:

MB

Parameters:

Top FPS:

Architecture:

MetaCLIP is a zero-shot classification and embedding model developed by Meta AI. Learn more »
Object Detection
Object Detection
Object Detection
Object Detection

Model Size:

MB

Parameters:

Top FPS:

Architecture:

OWLv2 is a transformer-based object detection model developed by Google Research. OWLv2 is the successor to OWL ViT. Learn more »
Object Detection
Object Detection
Object Detection
Object Detection

Model Size:

MB

Parameters:

Top FPS:

Architecture:

LLaVA is an open source multimodal language model that you can use for visual question answering and has limited support for object detection. Learn more »
Object Detection
Object Detection
Object Detection
Object Detection

Model Size:

MB

Parameters:

Top FPS:

Architecture:

Kosmos-2 is a multimodal language model capable of object detection and grounding text in images. Learn more »
Instance Segmentation
Instance Segmentation
Instance Segmentation
Instance Segmentation

Model Size:

MB

Parameters:

Top FPS:

Architecture:

Detic is an open source segmentation model developed by Meta Research and released in 2022. Learn more »
Instance Segmentation
Instance Segmentation
Instance Segmentation
Instance Segmentation

Model Size:

MB

Parameters:

Top FPS:

Architecture:

Segment Anything (SAM) is an image segmentation model developed by Meta Research, capable of doing zero-shot segmentation. Learn more »
Object Detection
Object Detection
Object Detection
Object Detection

Model Size:

MB

Parameters:

Top FPS:

Architecture:

Grounding DINO is a zero-shot object detection model made by combining a Transformer-based DINO detector and grounded pre-training. Learn more »
Classification
Classification
Classification
Classification

Model Size:

MB

Parameters:

Top FPS:

Architecture:

CLIP (Contrastive Language-Image Pre-Training) is an impressive multimodal zero-shot image classifier that achieves impressive results in a wide range of domains with no fine-tuning. It applies the recent advancements in large-scale transformers like GPT-3 to the vision arena. Learn more »

Frequently Asked Questions

What is semantic segmentation?

Semantic segmentation models assign labels for each pixel in an image. This information is used to identify exactly where an object is in an image. With semantic segmentation, different instances of the same object type (i.e. a tree or a screw) can be uniquely identified.

What are the use cases for semantic segmentation?

Semantic segmentation models are useful when you need to know exactly where an object is an image and be able to distinguish between different instances of that object. For example, if there are two birds in an image, you should be able to distinguish between them both, instead of each bird being given the label “bird” and grouped together.

Here are a few scenarios where semantic segmentation is useful:

  • Detecting tumors in MRI scans.
  • Providing detailed information to a self-driving car about its surroundings.
  • Detecting dents and scratches on a vehicle.

What models are used for semantic segmentation?

The SegFormer model represents the state-of-the-art in semantic segmentation. SegFormer is designed to work on images of any resolution without having an impact on inference performance.

Where can I learn more about semantic segmentation?

See more learning resources

Deploy a computer vision model today

Join 100k developers curating high quality datasets and deploying better models with Roboflow.

Get started