CogVLM is a multimodal model that answers questions about text and images. CogVLM, unlike many multimodal models, is open source and can be run on your own infrastructure. In our testing, CogVLM performed well at a range of vision tasks in our testing, from visual question answering to document OCR.
With Roboflow Inference, you can deploy CogVLM with minimal manual setup. Inference is a computer vision inference server with which you can deploy a range of state-of-the-art model architectures, from YOLOv8 to CLIP to CogVLM.
Inference enables you to run CogVLM with quantization. Quantization compresses the model, allowing you to run the model with less memory requirements (albeit with a slight accuracy trade-off). Using 4-bit quantization, you can run CogVLM on an NVIDIA T4 GPU. In our testing, requests with this configuration take around 10 seconds to process.
To learn how to deploy CogVLM on your own infrastructure, refer to our guide on how to deploy CogVLM.
To use CogVLM with Inference, you will need a Roboflow API key. If you don't already have a Roboflow account, sign up for a free Roboflow account.
Then, retrieve your API key from the Roboflow dashboard. Learn how to retrieve your API key.
Run the following command to set your API key in your coding environment:
export ROBOFLOW_API_KEY=<your api key>
We recommend using CogVLM paired with inference HTTP API adjusted to run in GPU environment. It's easy to set up with our inference-cli
tool. Run the following command to set up environment and run the API under http://localhost:9001
pip install inference inference-cli inference-sdk
inference server start
Use the following code to send a question to CogVLM:
import os
from inference_sdk import InferenceHTTPClient
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001", # only local hosting supported
api_key=os.environ["ROBOFLOW_API_KEY"]
)
result = CLIENT.prompt_cogvlm(
visual_prompt="./forklift.jpg",
text_prompt="Is there a forklift close to a conveyor belt?",
)
print(result)