Use the widget below to experiment with Gemma 3. You can detect COCO classes such as people, vehicles, animals, household items.
Gemma 3 is a multimodal language model developed by Google. The model comes in four sizes: 1B, 4B, 12B, and 27B. The larger the model size you use, the more accurate the model is. The model is released under Google's custom Gemma license.
The 1B model has a context window of 32K tokens whereas the other model sizes have a context window of 128K, a 16x increase from previous models in the Gemma series. 128K is enough to fit thousands of words of text or multiple images in a single prompt.
The model comes with instruction tuned checkpoints that you can use to have discussions with the model. These checkpoints are available for download on Kaggle and Hugging Face. You can also use the models in the Google AI Studio.
Here is how Gemma 3 performs on our qualitative multimodal tests:
Gemma 3
is licensed under a
Gemma License
license.
You can use Roboflow Inference to deploy a
Gemma 3
API on your hardware. You can deploy the model on CPU (i.e. Raspberry Pi, AI PCs) and GPU devices (i.e. NVIDIA Jetson, NVIDIA T4).
Below are instructions on how to deploy your own model API.