Try the Model

Use the widget below to experiment with Florence 2. You can detect COCO classes such as people, vehicles, animals, household items.

Overview

Florence-2 is a lightweight vision-language model open-sourced by Microsoft under the MIT license. The model demonstrates strong zero-shot and fine-tuning capabilities across tasks such as captioning, object detection, grounding, and segmentation.

Despite its small size, it achieves results on par with models many times larger, like Kosmos-2. The model's strength lies not in a complex architecture but in the large-scale FLD-5B dataset, consisting of 126 million images and 5.4 billion comprehensive visual annotations.

Learn how to fine-tune Florence-2 here.

We have made an interactive playground that you can use to test Florence-2. In the below widget, upload an image, then run the playground.

The playground will aim to identify bounding boxes for every object in the image using Florence-2's open ended object detection task type.

It may take several seconds to see the result for your image.

Florence 2 License

Florence 2

is licensed under a

MIT

license.

Performance

Benchmarks

‍

Compared to other generalist and specialist models, Florence 2 performs similar to models exponentially larger than itself. In terms of text visual question answering (TextVQA), Florence-2 out performs all other existing specialist and generalist models.

‍

‍

In terms of Zero shot models, Florence 2 outperforms both Kosmos-2 and Flamingo, two massive multimodal models.

Both images were gotten from the Florence-2 Paper.

‍

Deploy a Florence 2 API

You can use Roboflow Inference to deploy a

Florence 2

API on your hardware. You can deploy the model on CPU (i.e. Raspberry Pi, AI PCs) and GPU devices (i.e. NVIDIA Jetson, NVIDIA T4).

Below are instructions on how to deploy your own model API.

Create a new Python file called app.py and add the following code:‍


from inference import get_model

model = get_model("florence-2-base", api_key="API_KEY")

result = model.infer(
    "https://media.roboflow.com/inference/seawithdock.jpeg", 
    prompt="<CAPTION>",
)

print(result[0].response)

Above, replace <CAPTION> with the name of the task you want to use. See a full list of Florence-2 task types.

Replace API_KEY with your Roboflow API key. Learn how to retrieve your Roboflow API key

To use PaliGemma with Inference, you will need a Roboflow API key. If you don't already have a Roboflow account, sign up for a free Roboflow account.

Then, run the Python script you have created:

python app.py

The result from your model will be printed to the console.

‍

Label Data Automatically with Florence 2

You can automatically label a dataset using Florence 2 with help from Autodistill, an open source package for training computer vision models. You can label a folder of images automatically with only a few lines of code. Below, see our tutorials that demonstrate how to use Florence 2 to train a computer vision model.

No items found.