Top Image Tagging Models

Use image tagging models to assign one or more labels to an image.

Deploy select models (i.e. YOLOv8, CLIP) using the Roboflow Hosted API, or your own hardware using Roboflow Inference.

Showing

of

models.

PaliGemma

PaliGemma is a vision language model (VLM) by Google that has multimodal capabilities.

Vision Language

Deploy with Roboflow

Segment Anything 3

Segment Anything 3 (SAM 3) is an image segmentation model released by Meta.

Instance Segmentation

Deploy with Roboflow

GPT-4o

GPT-4o is OpenAI’s third major iteration of GPT-4 expanding on the capabilities of GPT-4 with Vision

Vision Language

Deploy with Roboflow

OpenAI CLIP

CLIP (Contrastive Language-Image Pre-Training) is an impressive multimodal zero-shot image classifier that achieves impressive results in a wide range of domains with no fine-tuning. It applies the recent advancements in large-scale transformers like GPT-3 to the vision arena.

Video Classification

Deploy with Roboflow

CogVLM

CogVLM shows strong performance in Visual Question Answering (VQA) and other vision tasks.

Vision Language

Deploy with Roboflow

SigLIP

SigLIP is an image embedding model defined in the "Sigmoid Loss for Language Image Pre-Training" paper.

Deploy with Roboflow

MetaCLIP

MetaCLIP is a zero-shot classification and embedding model developed by Meta AI.

Deploy with Roboflow

MobileCLIP

MobileCLIP is an image embedding model developed by Apple and introduced in the "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training" paper

Deploy with Roboflow

GPT-4 with Vision

GPT-4 with Vision is a multimodal language model developed by OpenAI.

Object Detection

Deploy with Roboflow

BLIP

Deploy with Roboflow

BLIPv2

BLIPv2 is a multimodal model developed by Salesforce Research.

Deploy with Roboflow

Visual Question Answering

Image Similarity

Image Captioning

Zero-shot Detection

Real-Time Vision

Image Embedding

LLMS with Vision Capabilities

Multimodal Vision

Foundation Vision