Use the widget below to experiment with BakLLaVA. You can detect COCO classes such as people, vehicles, animals, household items.
BakLLaVA is an LMM developed by LAION, Ontocord, and Skunkworks AI. BakLLaVA uses a Mistral 7B base augmented with the LLaVA 1.5 architecture. Used in combination with llama.cpp, a tool for running the LLaMA model in C++, you can use BakLLaVA on a laptop, provided you have enough GPU resources available.
The model was trained using a large and diverse dataset, including 558K filtered image-text pairs, 158K GPT-generated multimodal instruction-following data, 450K academic-task-oriented VQA (Visual Question Answering) data, and 40K ShareGPT data.
BakLLaVA is also an open-source model, encouraging community engagement and future development.
BakLLaVA
is licensed under a
Apache-2.0
license.
Tested on a variety of benchmarks, BakLLaVA does moderately well on OCR, VQA and Object detection tasks.
You can use Roboflow Inference to deploy a
BakLLaVA
API on your hardware. You can deploy the model on CPU (i.e. Raspberry Pi, AI PCs) and GPU devices (i.e. NVIDIA Jetson, NVIDIA T4).
Below are instructions on how to deploy your own model API.