Azure Image Analysis Alternatives

Azure's Image Analysis enables you to identify objects and text in images without prior training. Below, we list a series of alternatives that you can use to identify objects in images and videos.

All models listed below can be deployed on the edge (i.e. on NVIDIA Jetson or Raspberry Pi) or using a hosted, scalable API.

If you're more interested in deploying a model without code, check out our Roboflow Deploy product.

Classification
Classification
Classification
Classification

Model Size:

MB

Parameters:

428000000.0

Architecture:

CLIP (Contrastive Language-Image Pre-Training) is an impressive multimodal zero-shot image classifier that achieves impressive results in a wide range of domains with no fine-tuning. It applies the recent advancements in large-scale transformers like GPT-3 to the vision arena. Learn more »

Deploy on the edge and at scale with Roboflow Inference

Run model on videos with Roboflow Hosted Video Inference

Object Detection
Object Detection
Object Detection
Object Detection

Model Size:

68200000.0

MB

Parameters:

Architecture:

YOLO, CNN

YOLOv8 is a state-of-the-art object detection and image segmentation model created by Ultralytics, the developers of YOLOv5. Learn more »

Deploy on the edge and at scale with Roboflow Inference

Run model on videos with Roboflow Hosted Video Inference

Object Detection
Object Detection
Object Detection
Object Detection

Model Size:

MB

Parameters:

Architecture:

DocTR is an Optical Character Recognition tool powered by deep learning. Learn more »

Deploy on the edge and at scale with Roboflow Inference

Run model on videos with Roboflow Hosted Video Inference

Multimodal Model
Multimodal Model
Multimodal Model
Multimodal Model

Model Size:

4000000000.0

MB

Parameters:

3 Billion

Architecture:

PaliGemma is a vision language model (VLM) by Google that has multimodal capabilities. Learn more »

Deploy on the edge and at scale with Roboflow Inference

Run model on videos with Roboflow Hosted Video Inference

Multimodal Model
Multimodal Model
Multimodal Model
Multimodal Model

Model Size:

MB

Parameters:

Architecture:

Gemini is a family of Large Multimodal Models (LMMs) developed by Google Deepmind focused specifically on multimodality. Learn more »

Deploy on the edge and at scale with Roboflow Inference

Run model on videos with Roboflow Hosted Video Inference

Deploy a computer vision model today

Join 800,000+ developers curating high quality datasets and deploying better models with Roboflow.

Get started