No items found.
Use the widget below to experiment with Phi-4 Multimodal. You can detect COCO classes such as people, vehicles, animals, household items.
Phi-4 Multimodal is a multimodal language model developed by Microsoft. The model has support for inputs in the following modalities:
On performance, the official model announcement notes:
Despite its smaller size, the model maintains competitive performance on general multimodal capabilities, such as document and chart understanding, Optical Character Recognition (OCR), and visual science reasoning, matching or exceeding close models like Gemini-2-Flash-lite-preview/Claude-3.5-Sonnet.
Phi-4 Multimodal
is licensed under a
MIT License
license.
You can use Roboflow Inference to deploy a
Phi-4 Multimodal
API on your hardware. You can deploy the model on CPU (i.e. Raspberry Pi, AI PCs) and GPU devices (i.e. NVIDIA Jetson, NVIDIA T4).
Below are instructions on how to deploy your own model API.