Vision Models by Apple

Explore computer vision and multimodal models developed by Apple.

Deploy select models (i.e. YOLOv8, CLIP) using the Roboflow Hosted API, or your own hardware using Roboflow Inference.

Showing

of

models.

4M

The 4M model is a versatile multimodal Transformer model developed by EPFL and Apple, capable of handling a handful of vision and language tasks.

Object Detection

Deploy with Roboflow

MobileCLIP

MobileCLIP is an image embedding model developed by Apple and introduced in the "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training" paper

Deploy with Roboflow

FastViT

FastViT is a fast image classification model developed by Apple.

Deploy with Roboflow

Visual Question Answering

Image Similarity

Image Captioning

Zero-shot Detection

Real-Time Vision

Image Embedding

LLMS with Vision Capabilities

Multimodal Vision

Foundation Vision