Use the widget below to experiment with Google Gemini . You can detect COCO classes such as people, vehicles, animals, household items.
Gemini is a family of Large Multimodal Models (LMMs) developed by Google Deepmind focused specifically on multimodality, working with visual, audio, and text formats.
Gemini is the name of both the underlying LMM, as well as a consumer chatbot interface, formally named Bard, that uses the Gemini models.
Gemini comes in three sizes: Ultra, Pro, and Nano. Ultra is the largest and “most capable” model, while Pro is the middle option, with Nano being the smallest size, capable of on-device inference.
Gemini Pro is most known for its incredible input window of 1,000,000 tokens (400,000-750,000 words).
Learn more about Gemini here.
Google Gemini
is licensed under a
license.
Gemini's three models are benchmarked on various tasks. Through most tests, Gemini 1.0 Ultra and Gemini 1.5 Pro both exhibit competency through math problems, science diagrams, video captioning, etc.
You can use Roboflow Inference to deploy a
Google Gemini
API on your hardware. You can deploy the model on CPU (i.e. Raspberry Pi, AI PCs) and GPU devices (i.e. NVIDIA Jetson, NVIDIA T4).
Below are instructions on how to deploy your own model API.