Try the Model

Use the widget below to experiment with OWL ViT. You can detect COCO classes such as people, vehicles, animals, household items.

Overview

OWL-ViT is a transformer-based object detection model developed by Google Research.

OWL ViT License

OWL ViT

is licensed under a

Apache 2.0

license.

Performance

Deploy a OWL ViT API

You can use Roboflow Inference to deploy a

OWL ViT

API on your hardware. You can deploy the model on CPU (i.e. Raspberry Pi, AI PCs) and GPU devices (i.e. NVIDIA Jetson, NVIDIA T4).

Below are instructions on how to deploy your own model API.

First, install Autodistill and Autodistill OWL-ViT:


pip install autodistill autodistill-owl-vit

Then, run:


from autodistill_owl_vit import OWLViT
from autodistill.detection import CaptionOntology

# define an ontology to map class names to our OWLViT prompt
# the ontology dictionary has the format {caption: class}
# where caption is the prompt sent to the base model, and class is the label that will
# be saved for that caption in the generated annotations
# then, load the model
base_model = OWLViT(
    ontology=CaptionOntology(
        {
            "person": "person",
            "a forklift": "forklift"
        }
    )
)
result = base_model.predict("image.jpeg")
print(result)

‍

Label Data Automatically with OWL ViT

You can automatically label a dataset using OWL ViT with help from Autodistill, an open source package for training computer vision models. You can label a folder of images automatically with only a few lines of code. Below, see our tutorials that demonstrate how to use OWL ViT to train a computer vision model.

OWL-ViT