Google Cloud Vision Object Detection Alternatives

Google Cloud Vision's object detection API enables you to identify objects in images without prior training. Below, we list a series of alternatives that you can use to identify objects in images and videos.

All models listed below can be deployed on the edge.

If you're more interested in deploying a model without code, check out our Roboflow Deploy product.

Multimodal Model
Multimodal Model
Multimodal Model
Multimodal Model

Model Size:

MB

Parameters:

3 Billion

Top FPS:

Architecture:

PaliGemma is a vision language model (VLM) by Google that has multimodal capabilities. Learn more »

Deploy on the edge and at scale with Roboflow Inference

Run model on videos with Roboflow Hosted Video Inference

Multimodal Model
Multimodal Model
Multimodal Model
Multimodal Model

Model Size:

MB

Parameters:

Top FPS:

Architecture:

Gemini is a family of Large Multimodal Models (LMMs) developed by Google Deepmind focused specifically on multimodality. Learn more »

Deploy on the edge and at scale with Roboflow Inference

Run model on videos with Roboflow Hosted Video Inference

Object Detection
Object Detection
Object Detection
Object Detection

Model Size:

MB

Parameters:

Top FPS:

Architecture:

DocTR is an Optical Character Recognition tool powered by deep learning. Learn more »

Deploy on the edge and at scale with Roboflow Inference

Run model on videos with Roboflow Hosted Video Inference

Object Detection
Object Detection
Object Detection
Object Detection

Model Size:

MB

Parameters:

Top FPS:

Architecture:

YOLO, CNN

YOLOv8 is a state-of-the-art object detection and image segmentation model created by Ultralytics, the developers of YOLOv5. Learn more »

Deploy on the edge and at scale with Roboflow Inference

Run model on videos with Roboflow Hosted Video Inference

Classification
Classification
Classification
Classification

Model Size:

MB

Parameters:

Top FPS:

Architecture:

CLIP (Contrastive Language-Image Pre-Training) is an impressive multimodal zero-shot image classifier that achieves impressive results in a wide range of domains with no fine-tuning. It applies the recent advancements in large-scale transformers like GPT-3 to the vision arena. Learn more »

Deploy on the edge and at scale with Roboflow Inference

Run model on videos with Roboflow Hosted Video Inference

Deploy a computer vision model today

Join 250,000+ developers curating high quality datasets and deploying better models with Roboflow.

Get started