Use the widget below to experiment with CogVLM. You can detect COCO classes such as people, vehicles, animals, household items.
CogVLM is a multimodal model that answers questions about text and images. CogVLM, unlike many multimodal models, is open source and can be run on your own infrastructure. The paper states CogVLM achieves state-of-the-art performance through 9 benchmarks, and achieves second on 4. In our testing, CogVLM performed well at a range of vision tasks in our testing, from visual question answering to document OCR.
CogVLM currently has 2 models available: CogVLM and CogVLM2. The difference between the two models is through their parameters and training, as CogVLM2 has 2 billion more parameters and is based off the Llama3-8b architecture. Additionally, CogVLM2 is supported in Mandarin, which allows for multilingual applications within the multimodal model.
With Roboflow Inference, you can deploy CogVLM with minimal manual setup. Inference is a computer vision inference server with which you can deploy a range of state-of-the-art model architectures, from YOLOv8 to CLIP to CogVLM.
Inference enables you to run CogVLM with quantization. Quantization compresses the model, allowing you to run the model with less memory requirements (albeit with a slight accuracy trade-off). Using 4-bit quantization, you can run CogVLM on an NVIDIA T4 GPU. In our testing, requests with this configuration take around 10 seconds to process.
To learn how to deploy CogVLM on your own infrastructure, refer to our guide on how to deploy CogVLM.
CogVLM
is licensed under a
Apache-2.0
license.
You can use Roboflow Inference to deploy a
CogVLM
API on your hardware. You can deploy the model on CPU (i.e. Raspberry Pi, AI PCs) and GPU devices (i.e. NVIDIA Jetson, NVIDIA T4).
Below are instructions on how to deploy your own model API.
To use CogVLM with Inference, you will need a Roboflow API key. If you don't already have a Roboflow account, sign up for a free Roboflow account.
Then, retrieve your API key from the Roboflow dashboard. Learn how to retrieve your API key.
Run the following command to set your API key in your coding environment:
export ROBOFLOW_API_KEY=<your api key>
We recommend using CogVLM paired with inference HTTP API adjusted to run in GPU environment. It's easy to set up with our inference-cli
tool. Run the following command to set up environment and run the API under http://localhost:9001
pip install inference inference-cli inference-sdk
inference server start
Use the following code to send a question to CogVLM:
import os
from inference_sdk import InferenceHTTPClient
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001", # only local hosting supported
api_key=os.environ["ROBOFLOW_API_KEY"]
)
result = CLIENT.prompt_cogvlm(
visual_prompt="./forklift.jpg",
text_prompt="Is there a forklift close to a conveyor belt?",
)
print(result)