Try the Model

Use the widget below to experiment with CogVLM. You can detect COCO classes such as people, vehicles, animals, household items.

Overview

CogVLM is a multimodal model that answers questions about text and images. CogVLM, unlike many multimodal models, is open source and can be run on your own infrastructure. The paper states CogVLM achieves state-of-the-art performance through 9 benchmarks, and achieves second on 4. In our testing, CogVLM performed well at a range of vision tasks in our testing, from visual question answering to document OCR.

*How CogVLM compares to other multimodal vision models.*

‍

CogVLM currently has 2 models available: CogVLM and CogVLM2. The difference between the two models is through their parameters and training, as CogVLM2 has 2 billion more parameters and is based off the Llama3-8b architecture. Additionally, CogVLM2 is supported in Mandarin, which allows for multilingual applications within the multimodal model.

‍

How to Deploy

With Roboflow Inference, you can deploy CogVLM with minimal manual setup. Inference is a computer vision inference server with which you can deploy a range of state-of-the-art model architectures, from YOLOv8 to CLIP to CogVLM.

Inference enables you to run CogVLM with quantization. Quantization compresses the model, allowing you to run the model with less memory requirements (albeit with a slight accuracy trade-off). Using 4-bit quantization, you can run CogVLM on an NVIDIA T4 GPU. In our testing, requests with this configuration take around 10 seconds to process.

To learn how to deploy CogVLM on your own infrastructure, refer to our guide on how to deploy CogVLM.

CogVLM License

CogVLM

is licensed under a

Apache-2.0

license.

Performance

Deploy a CogVLM API

You can use Roboflow Inference to deploy a

CogVLM

API on your hardware. You can deploy the model on CPU (i.e. Raspberry Pi, AI PCs) and GPU devices (i.e. NVIDIA Jetson, NVIDIA T4).

Below are instructions on how to deploy your own model API.

To use CogVLM with Inference, you will need a Roboflow API key. If you don't already have a Roboflow account, sign up for a free Roboflow account.

Then, retrieve your API key from the Roboflow dashboard. Learn how to retrieve your API key.

Run the following command to set your API key in your coding environment:

export ROBOFLOW_API_KEY=<your api key>

We recommend using CogVLM paired with inference HTTP API adjusted to run in GPU environment. It's easy to set up with our inference-cli tool. Run the following command to set up environment and run the API under http://localhost:9001

pip install inference inference-cli inference-sdk inference server start

Use the following code to send a question to CogVLM:


import os
from inference_sdk import InferenceHTTPClient

CLIENT = InferenceHTTPClient(
    api_url="http://localhost:9001",  # only local hosting supported
    api_key=os.environ["ROBOFLOW_API_KEY"]
)

result = CLIENT.prompt_cogvlm(
    visual_prompt="./forklift.jpg",
    text_prompt="Is there a forklift close to a conveyor belt?",
)
print(result)

Label Data Automatically with CogVLM

You can automatically label a dataset using CogVLM with help from Autodistill, an open source package for training computer vision models. You can label a folder of images automatically with only a few lines of code. Below, see our tutorials that demonstrate how to use CogVLM to train a computer vision model.

No items found.