Try the Model

Use the widget below to experiment with Gemma 3. You can detect COCO classes such as people, vehicles, animals, household items.

Overview

Gemma 3 is a multimodal language model developed by Google. The model comes in four sizes: 1B, 4B, 12B, and 27B. The larger the model size you use, the more accurate the model is. The model is released under Google's custom Gemma license.

The 1B model has a context window of 32K tokens whereas the other model sizes have a context window of 128K, a 16x increase from previous models in the Gemma series. 128K is enough to fit thousands of words of text or multiple images in a single prompt.

The model comes with instruction tuned checkpoints that you can use to have discussions with the model. These checkpoints are available for download on Kaggle and Hugging Face. You can also use the models in the Google AI Studio.

Here is how Gemma 3 performs on our qualitative multimodal tests:

Gemma 3 License

Gemma 3

is licensed under a

Gemma License

license.

Performance

Deploy a Gemma 3 API

You can use Roboflow Inference to deploy a

Gemma 3

API on your hardware. You can deploy the model on CPU (i.e. Raspberry Pi, AI PCs) and GPU devices (i.e. NVIDIA Jetson, NVIDIA T4).

Below are instructions on how to deploy your own model API.

Label Data Automatically with Gemma 3

You can automatically label a dataset using Gemma 3 with help from Autodistill, an open source package for training computer vision models. You can label a folder of images automatically with only a few lines of code. Below, see our tutorials that demonstrate how to use Gemma 3 to train a computer vision model.

No items found.