Try the Model

Use the widget below to experiment with Moondream 2. You can detect COCO classes such as people, vehicles, animals, household items.

Overview

Moondream 2 is the latest model in the Moondream series of “tiny vision language models”. The model, developed by vikhyat, is trained to perform a wide range of tasks, from VQA to image captioning to object detection and calculating x-y points of regions in an image. The model is licensed under an Apache 2.0 license.

Moondream 2 can be run on both CPU and GPUs. You can run the model with the moondream Python package or through the Hugging Face Transformers Python package. The moondream Python package does not support GPUs at the time of writing this guide according to the project repository, although this may change in the future.

The Moondream transformers implementation has four modes of inference:

caption() (Image captioning)
query() (VQA)
detect() (Object detection)
point() (Calculate x/y coordinates of a region in an image)

Here is how Moondream performs when evaluated on various qualitative tests:

Moondream 2 License

Moondream 2

is licensed under a

Apache 2.0

license.

Performance

Deploy a Moondream 2 API

You can use Roboflow Inference to deploy a

Moondream 2

API on your hardware. You can deploy the model on CPU (i.e. Raspberry Pi, AI PCs) and GPU devices (i.e. NVIDIA Jetson, NVIDIA T4).

Below are instructions on how to deploy your own model API.

Label Data Automatically with Moondream 2

You can automatically label a dataset using Moondream 2 with help from Autodistill, an open source package for training computer vision models. You can label a folder of images automatically with only a few lines of code. Below, see our tutorials that demonstrate how to use Moondream 2 to train a computer vision model.

No items found.