Gemini is a family of Large Multimodal Models (LMMs) developed by Google Deepmind focused specifically on multimodality.
Overview
Gemini is a family of Large Multimodal Models (LMMs) developed by Google Deepmind focused specifically on multimodality, working with visual, audio, and text formats.
Gemini is the name of both the underlying LMM, as well as a consumer chatbot interface, formally named Bard, that uses the Gemini models.
Gemini comes in three sizes: Ultra, Pro, and Nano. Ultra is the largest and “most capable” model, while Pro is the middle option, with Nano being the smallest size, capable of on-device inference.
Performance
Use This Model
Label Data Automatically with Google Gemini
You can automatically label a dataset using Google Gemini with help from Autodistill, an open source package for training computer vision models. You can label a folder of images automatically with only a few lines of code. Below, see our tutorials that demonstrate how to use Google Gemini to train a computer vision model.
No items found.
Deploy to Production
Roboflow offers a range of SDKs with which you can deploy your model to production.
Curious about how this model compares to others? Check out our model comparisons.
Compare with...
Convert Annotation Format
YOLOv8 uses the uses the YOLOv8 PyTorch TXT annotation format. If your annotation is in a different format, you can use Roboflow's annotation conversion tools to get your data into the right format.