BakLLaVA is an LMM developed by LAION, Ontocord, and Skunkworks AI. BakLLaVA uses a Mistral 7B base augmented with the LLaVA 1.5 architecture.
Overview
BakLLaVA is an LMM developed by LAION, Ontocord, and Skunkworks AI. BakLLaVA uses a Mistral 7B base augmented with the LLaVA 1.5 architecture. Used in combination with llama.cpp, a tool for running the LLaMA model in C++, you can use BakLLaVA on a laptop, provided you have enough GPU resources available.
The model was trained using a large and diverse dataset, including 558K filtered image-text pairs, 158K GPT-generated multimodal instruction-following data, 450K academic-task-oriented VQA (Visual Question Answering) data, and 40K ShareGPT data.
BakLLaVA is also an open-source model, encouraging community engagement and future development.
Performance
Tested on a variety of benchmarks, BakLLaVA does moderately well on OCR, VQA and Object detection tasks.
You can automatically label a dataset using BakLLaVA with help from Autodistill, an open source package for training computer vision models. You can label a folder of images automatically with only a few lines of code. Below, see our tutorials that demonstrate how to use BakLLaVA to train a computer vision model.
No items found.
Deploy to Production
Roboflow offers a range of SDKs with which you can deploy your model to production.
YOLOv8 uses the uses the YOLOv8 PyTorch TXT annotation format. If your annotation is in a different format, you can use Roboflow's annotation conversion tools to get your data into the right format.