Vision Transformer Alternatives

Explore alternatives to the Vision Transformer (ViT) classification model architecture.

Deploy select models (i.e. YOLOv8, CLIP) using the Roboflow Hosted API, or your own hardware using Roboflow Inference.

Showing

of

models.

YOLOv5 Classification

YOLOv5 Classification is a version of the YOLOv5 model used in single-label and multi-label image classification.

Deploy with Roboflow

YOLOv8 Classification

An image classification model built using YOLOv8.

Deploy with Roboflow

Vision Transformer

The Vision Transformer leverages powerful natural language processing embeddings (BERT) and applies them to images.

Deploy with Roboflow

EfficientNet

EfficientNet is from a family of image classification models from GoogleAI that train comparatively quickly on small amounts of data, making the most of limited datasets.

Deploy with Roboflow

SigLIP

SigLIP is an image embedding model defined in the "Sigmoid Loss for Language Image Pre-Training" paper.

Deploy with Roboflow

MetaCLIP

MetaCLIP is a zero-shot classification and embedding model developed by Meta AI.

Deploy with Roboflow

ResNet 32

A fast, simple convolutional neural network that gets the job done for many tasks, including classification.

Deploy with Roboflow

ResNet-50

ResNet-50 is a popular image classification model architecture.

Deploy with Roboflow

AltCLIP

AltCLIP is a zero-shot image classification model.

Deploy with Roboflow

RemoteCLIP

RemoteCLIP is a zero-shot classification model for remote sensing.

Deploy with Roboflow

BioCLIP

BioCLIP is a Vision Foundation Model for the Tree of Life

Deploy with Roboflow

MobileCLIP

MobileCLIP is an image embedding model developed by Apple and introduced in the "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training" paper

Deploy with Roboflow

BLIP

Deploy with Roboflow

BLIPv2

BLIPv2 is a multimodal model developed by Salesforce Research.

Deploy with Roboflow

ALBEF

Deploy with Roboflow

FastViT

FastViT is a fast image classification model developed by Apple.

Deploy with Roboflow

ResNet 34

A fast, simple convolutional neural network that gets the job done for many tasks, including classification.

Deploy with Roboflow

MobileNet V2 Classification

MobileNet is a GoogleAI model well-suited for on-device, real-time classification (distinct from MobileNetSSD, Single Shot Detector). This implementation leverages transfer learning from ImageNet to your dataset.

Deploy with Roboflow

Visual Question Answering

Image Similarity

Image Captioning

Zero-shot Detection

Real-Time Vision

Image Embedding

LLMS with Vision Capabilities

Multimodal Vision

Foundation Vision