Hosted or on-device deployment
SDKs optimized for maximum performance
Extensive documentation
In this guide, we are going to show how to deploy a
CogVLM
model to
GCP
using Roboflow Inference. Inference is a high-performance inference server with which you can run a range of vision models, from YOLOv8 to CLIP to CogVLM.
To deploy a
CogVLM
model to
GCP
, we will:
1. Set up our computing environment
2. Download the Roboflow Inference Server
3. Try out our model on an example image
Let's get started!
In this guide, we are going to show how to deploy a
CogVLM
model to
GCP
using the Roboflow Inference Server. This SDK works with
CogVLM
models trained on both Roboflow and in custom training processes outside of Roboflow.
To deploy a
CogVLM
model to
GCP
, we will:
1. Train a model on (or upload a model to) Roboflow
2. Download the Roboflow Inference Server
3. Install the Python SDK to run inference on images
4. Try out the model on an example image
Let's get started!
If you want to upload your own model weights, first create a Roboflow account and create a new project. When you have created a new project, upload your project data, then generate a new dataset version. With that version ready, you can upload your model weights to Roboflow.
Download the Roboflow Python SDK:
pip install roboflow
Then, use the following script to upload your model weights:
from roboflow import Roboflow
home = "/path/to/project/folder"
rf = Roboflow(api_key=os.environ["ROBOFLOW_API_KEY"])
project = rf.workspace().project("PROJECT_ID")
project.version(PROJECT_VERSION).deploy(model_type="yolov5", model_path=f"/{home}/yolov5/runs/train/")
You will need your project name, version, API key, and model weights. The following documentation shows how to retrieve your API key and project information:
- Retrieve your Roboflow project name and version
- Retrieve your API key
Change the path in the script above to the path where your model weights are stored.
When you have configured the script above, run the code to upload your weights to Roboflow.
Now you are ready to start deploying your model.
Open GCP Compute Engine and click the “Create Instance” button to create a virtual machine.
Next, you need to configure your instance. The requirements for configuration depend on your use case. If you are deploying a server for production, you may opt for a more powerful machine configuration. If you are testing a model and plan to deploy on another machine in the future, you may instead opt to deploy a less powerful machine.
You must deploy on a system with an NVIDIA GPU to run CogVLM with Inference.
A cost panel will appear on the right of the screen that estimates the cost of the machine you are deploying.
Fill out the required fields to configure your virtual machine. Then, click the “Create” button to create a virtual machine. It will take a few moments before your machine is ready. You can view the status from the Compute Engine Instances page.
When your virtual machine has been deployed, click on the machine name in the list of virtual machines on the Compute Engine Instances page.
To sign in using SSH in a terminal, click the arrow next to the SSH button and click “View gcloud command” If you have not already installed gcloud, follow the gcloud installation and configuration instructions to get started.
The Roboflow Inference Server allows you to deploy computer vision models to a range of devices, including
GCP
.
The Inference Server relies on Docker to run. If you don't already have Docker installed on the device(s) on which you want to run inference, install it by following the official Docker installation instructions.
Once you have Docker installed, run the following command to download the Roboflow Inference Server on your
GCP
.
pip install inference inference-cli
inference server start
Now you have the Roboflow Inference Server running, you can use your model on
GCP
.
The Roboflow Inference Server provides a HTTP API with a range of methods you can use to query your model and various popular models (i.e. SAM, CLIP). You can read more about all of the API methods available on the Roboflow Inference server in the Inference Server documentation.
The Roboflow Python SDK provides abstract convenience methods for interacting with the HTTP API. In this guide, we will use the Python SDK to run inference on a model. You can also query the HTTP API itself.
To install the Python SDK, run the following command:
pip install roboflow
Create a new Python file and add the following code:
import base64
import os
from PIL import Image
import requests
PORT = 9001
API_KEY = ""
IMAGE_PATH = "forklift.png"
def encode_base64(image_path):
with open(image_path, "rb") as image:
x = image.read()
image_string = base64.b64encode(x)
return image_string.decode("ascii")
prompt = "Read the text in this image."
infer_payload = {
"image": {
"type": "base64",
"value": encode_base64(IMAGE_PATH),
},
"api_key": API_KEY,
"prompt": prompt,
}
results = requests.post(
f"http://localhost:{PORT}/llm/cogvlm",
json=infer_payload,
)
print(results.json())
This code will make a HTTP request to the /llm/cogvlm route on your Inference installation. This route accepts text and images which will be sent to CogVLM for processing. This route returns a JSON object with the text response from the model.
Above, replace:
1. ROBOFLOW_API_KEY with your Roboflow API key. Learn how to retrieve your Roboflow API key.
3. image.png with the image that you want to use to make a request.
4. prompt with the question you want to ask.
Let’s run the code on the following image of a forklift and ask the question “Is there a forklift close to a conveyor belt?”:
We take security seriously and have implemented comprehensive measures to keep your sensitive data safe
Below, you can find our guides on how to deploy
CogVLM
models to other devices.
The following resources are useful reference material for working with your model using Roboflow and the Roboflow Inference Server.