Use the widget below to experiment with LLaVA-1.5. You can detect COCO classes such as people, vehicles, animals, household items.
LLaVA-1.5 is an open-source, multi-modal language model. You can ask LLaVA-1.5 questions in text and optionally provide an image as context for your question. The code for LLaVA-1.5 was released to accompany the "Improved Baselines with Visual Instruction Tuning" paper. Use the demo.
The authors of the paper note in the abstract "With simple modifications to LLaVA, namely, using CLIP-ViT-L-336px with an MLP projection and adding academic-task-oriented VQA data with simple response formatting prompts, we establish stronger baselines that achieve state-of-the-art [performance] across 11 benchmarks."
LLaVA-1.5 is available for use in an online demo playground, with which you can experiment with the model.
LLaVA-1.5
is licensed under a
Apache-2.0
license.
Across multiple tests, LLaVa-1.5 presents SOTA results, outmatching several other models.
You can use Roboflow Inference to deploy a
LLaVA-1.5
API on your hardware. You can deploy the model on CPU (i.e. Raspberry Pi, AI PCs) and GPU devices (i.e. NVIDIA Jetson, NVIDIA T4).
Below are instructions on how to deploy your own model API.