Try the Model

Use the widget below to experiment with Google Gemini . You can detect COCO classes such as people, vehicles, animals, household items.

Overview

Gemini is a family of Large Multimodal Models (LMMs) developed by Google Deepmind focused specifically on multimodality, working with visual, audio, and text formats.

Gemini is the name of both the underlying LMM, as well as a consumer chatbot interface, formally named Bard, that uses the Gemini models.

Gemini comes in three sizes: Ultra, Pro, and Nano. Ultra is the largest and “most capable” model, while Pro is the middle option, with Nano being the smallest size, capable of on-device inference.

Gemini Pro is most known for its incredible input window of 1,000,000 tokens (400,000-750,000 words).

Graph of the input windows of various models. Sourced from TensorOps

Learn more about Gemini here.

Google Gemini License

Google Gemini

is licensed under a

license.

Performance

‍

Gemini's three models are benchmarked on various tasks. Through most tests, Gemini 1.0 Ultra and Gemini 1.5 Pro both exhibit competency through math problems, science diagrams, video captioning, etc.

‍

‍

Deploy a Google Gemini API

You can use Roboflow Inference to deploy a

Google Gemini

API on your hardware. You can deploy the model on CPU (i.e. Raspberry Pi, AI PCs) and GPU devices (i.e. NVIDIA Jetson, NVIDIA T4).

Below are instructions on how to deploy your own model API.

Label Data Automatically with Google Gemini

You can automatically label a dataset using Google Gemini with help from Autodistill, an open source package for training computer vision models. You can label a folder of images automatically with only a few lines of code. Below, see our tutorials that demonstrate how to use Google Gemini to train a computer vision model.

No items found.