Products
Platform
Universe
Open source computer vision datasets and pre-trained models
Annotate
Label images fast with AI-assisted data annotation
Train
Hosted model training infrastructure and GPU access
Workflows
Low-code interface to build pipelines and applications
Deploy
Run models on device, at the edge, in your VPC, or via API
Solutions
By Industry
Aerospace & Defence
Agriculture
Automotive
Banking & Finance
Government
Healthcare & Medicine
Manufacturing
Oil & Gas
Retail & Ecommerce
Safety & Security
Telecommunications
Transportation
Utilities
Developers
Resources
Documentation
User Forum
Computer Vision Models
Blog
Convert Annotation Formats
Learn Computer Vision
Inference Templates
Weekly Product Webinar
Pricing
Docs
Blog
Sign In
Get Started
LLaVA-1.5 Alternatives
Explore alternatives to the LLaVA-1.5 multimodal vision language model.
Filter Models
Search Models
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Apply
Deploy select models (i.e. YOLOv8, CLIP) using the Roboflow Hosted API, or your own hardware using
Roboflow Inference
.
Showing
of
models.
PaliGemma
PaliGemma is a vision language model (VLM) by Google that has multimodal capabilities.
Multimodal Model
Deploy with Roboflow
GPT-4o
GPT-4o is OpenAI’s third major iteration of GPT-4 expanding on the capabilities of GPT-4 with Vision
Multimodal Model
Deploy with Roboflow
CogVLM
CogVLM shows strong performance in Visual Question Answering (VQA) and other vision tasks.
Multimodal Model
Deploy with Roboflow
QwenVL
Qwen-VL is an LMM developed by Alibaba Cloud. Qwen-VL accepts images, text, and bounding boxes as inputs. The model can output text and bounding boxes. Qwen-VL naturally supports English, Chinese, and multilingual conversation.
Multimodal Model
Deploy with Roboflow
BakLLaVA
BakLLaVA is an LMM developed by LAION, Ontocord, and Skunkworks AI. BakLLaVA uses a Mistral 7B base augmented with the LLaVA 1.5 architecture.
Multimodal Model
Deploy with Roboflow
Anthropic Claude 3
Multimodal Model
Deploy with Roboflow
Google Gemini
Gemini is a family of Large Multimodal Models (LMMs) developed by Google Deepmind focused specifically on multimodality.
Multimodal Model
Deploy with Roboflow
Phi-3.5
Multimodal Model
Deploy with Roboflow
Visual Question Answering
Image Tagging
Image Similarity
Image Captioning
Zero-shot Detection
Real-Time Vision
Image Embedding
LLMS with Vision Capabilities
Multimodal Vision
Foundation Vision