Try the Model

Use the widget below to experiment with BakLLaVA. You can detect COCO classes such as people, vehicles, animals, household items.

Overview

BakLLaVA is an LMM developed by LAION, Ontocord, and Skunkworks AI. BakLLaVA uses a Mistral 7B base augmented with the LLaVA 1.5 architecture. Used in combination with llama.cpp, a tool for running the LLaMA model in C++, you can use BakLLaVA on a laptop, provided you have enough GPU resources available.

The model was trained using a large and diverse dataset, including 558K filtered image-text pairs, 158K GPT-generated multimodal instruction-following data, 450K academic-task-oriented VQA (Visual Question Answering) data, and 40K ShareGPT data.

BakLLaVA is also an open-source model, encouraging community engagement and future development.

‍

BakLLaVA License

BakLLaVA

is licensed under a

Apache-2.0

license.

Performance

Tested on a variety of benchmarks, BakLLaVA does moderately well on OCR, VQA and Object detection tasks.

‍

Deploy a BakLLaVA API

You can use Roboflow Inference to deploy a

BakLLaVA

API on your hardware. You can deploy the model on CPU (i.e. Raspberry Pi, AI PCs) and GPU devices (i.e. NVIDIA Jetson, NVIDIA T4).

Below are instructions on how to deploy your own model API.

Label Data Automatically with BakLLaVA

You can automatically label a dataset using BakLLaVA with help from Autodistill, an open source package for training computer vision models. You can label a folder of images automatically with only a few lines of code. Below, see our tutorials that demonstrate how to use BakLLaVA to train a computer vision model.

No items found.