Use the widget below to experiment with Grounded SAM. You can detect COCO classes such as people, vehicles, animals, household items.
GroundedSAM combines Grounding DINO with the Segment Anything Model to identify and segment objects in an image given text captions.
This provides GroundedSAM with the ability to caption detections and segmentations. Additionally, it can segment objects based off text prompts.
GroundedSAM is also a zero-shot model, which allows the model to detect images that it was not trained on. This is perfect for auto-labeling tasks. Learn how to use GroundedSAM to autolabel your dataset.
Grounded SAM
is licensed under a
Apache 2.0
license.
Using a variety of day-to-day objects, researchers found the three versions of GroudnedSAM to be exceptional at zero-shot predictions on the data.
You can use Roboflow Inference to deploy a
Grounded SAM
API on your hardware. You can deploy the model on CPU (i.e. Raspberry Pi, AI PCs) and GPU devices (i.e. NVIDIA Jetson, NVIDIA T4).
Below are instructions on how to deploy your own model API.
First, install Autodistill and Autodistill Grounded SAM:
pip install autodistill-grounded-sam autodistill-yolov8
Then, run:
from autodistill_grounded_sam import GroundedSAM
from autodistill.detection import CaptionOntology
from autodistill.utils import plot
import cv2
# define an ontology to map class names to our GroundedSAM prompt
# the ontology dictionary has the format {caption: class}
# where caption is the prompt sent to the base model, and class is the label that will
# be saved for that caption in the generated annotations
# then, load the model
base_model = GroundedSAM(
ontology=CaptionOntology(
{
"person": "person",
"shipping container": "shipping container",
}
)
)
# run inference on a single image
results = base_model.predict("logistics.jpeg")
plot(
image=cv2.imread("logistics.jpeg"),
classes=base_model.ontology.classes(),
detections=results
)
# label all images in a folder called `context_images`
base_model.label("./context_images", extension=".jpeg")