Use the widget below to experiment with Segment Anything 2. You can detect COCO classes such as people, vehicles, animals, household items.
Segment Anything 2 (SAM 2) is a real-time image and video segmentation model. SAM 2 works on both images and videos. The previous version of SAM, on the other hand, was built explicitly for use in images. You can use SAM 2 to identify the location of specific objects in images.
There are two ways you can run SAM 2:
The automatic mask generator is ideal if you want to segment all objects. Using a prompt, on the other hand, allows you to be more specific in your segmentation.
To identify the location of an object, you need to provide a “prompt”. A prompt can be:
These prompts can be provided to an image or video.
We have made an interactive playground that you can use to test SAM 2. In the below widget, upload an image, then run the playground.
The playground will aim to identify bounding boxes for every object in the image using Florence-2, a zero-shot detection model. Then, the playground will calculate segmentation masks for each bounding box using SAM 2.
It may take 15-30 seconds to see the result for your image.
Segment Anything 2
is licensed under a
Apache 2.0
license.
You can use Roboflow Inference to deploy a
Segment Anything 2
API on your hardware. You can deploy the model on CPU (i.e. Raspberry Pi, AI PCs) and GPU devices (i.e. NVIDIA Jetson, NVIDIA T4).
Below are instructions on how to deploy your own model API.
You can use Segment Anything 2 with Florence 2 as a grounded segmentation model. With this combination, you can provide a text prompt to Florence 2 to retrieve bounding boxes that correspond to objects. You can then use Segment Anything 2 to generate segmentation masks that correspond to the objects in each bounding box.
pip install autodistill-grounded-sam-2
To generate a segmentation mask for objects in an image, you can use the following code:
from autodistill_grounded_sam_2 import GroundedSAM2
from autodistill.detection import CaptionOntology
from autodistill.utils import plot
import cv2
# define an ontology to map class names to our Grounded SAM 2 prompt
# the ontology dictionary has the format {caption: class}
# where caption is the prompt sent to the base model, and class is the label that will
# be saved for that caption in the generated annotations
# then, load the model
base_model = GroundedSAM2(
ontology=CaptionOntology(
{
"person": "person",
"shipping container": "shipping container",
}
)
)
# run inference on a single image
results = base_model.predict("logistics.jpeg")
plot(
image=cv2.imread("logistics.jpeg"),
classes=base_model.ontology.classes(),
detections=results
)
# label all images in a folder called `context_images`
base_model.label("./context_images", extension=".jpeg")
Above, replace "person" and "shipping container" with the text prompts that correspond to the objects you want to identify.