Roboflow Cloud Hosting

You can deploy any Workflow built with Roboflow in three ways:

Run a Workflow or a model on an image.
Run a Workflow or a model on stored video files.
Run a Workflow or model on images or video streams using dedicated infrastructure.

All of these options are backed by Roboflow Inference, our high-performance, open source computer vision inference server.

Enterprises of all sizes – from small businesses to Fortune 100 companies – use Inference to run, manage, and scale vision deployments. Tens of millions of API calls are processed every month in the Roboflow Cloud

Advantages of Using Roboflow Cloud Hosting

Running your computer vision models in the Roboflow Cloud – whether using our serverless API, our video API, or dedicated CPU/GPUs running Roboflow Inference – gives you tight integration with the rest of the Roboflow ecosystem.

By hosting with Roboflow, you get:

  1. Infrastructure that runs Inference, a high-performance computer vision server trusted with running millions of API calls a month;
  2. Pricing based on use, rather than by compute hour;
  3. Model monitoring which lets you see how your models are performing in production. Model monitoring works at any scale, from 1 to 100 models and beyond;
  4. The ability to run complex, multi-stage Workflows created in the Roboflow Workflows editor without having to manage your own infrastructure, and more.

Comparing Roboflow Cloud Hosting Options

 
Serverless API
Hosted Video API
Dedicated Deployments
How it Works
Send an image to the Roboflow API, get a result with predictions.
You can call Workflows or individual models.
Send a video, then get results when the video is processed. You can apply one or more models to frames in a video. Workflows are not supported.
Provision dedicated CPUs and GPUs for your workloads. You can call Workflows or individual models.
Best Suited For
Processing images at any scale.
Processing stored videos at any scale.
Run foundation models like SAM-2, CLIP, PaliGemma-2, and Florence-2.
Scales
Automatically as you make more API calls.
Automatically as you send more videos.
Capable of processing up to 40 RPS at < 250ms latency.
Supported Models
All Roboflow Workflows. Object detection, segmentation, classification, and keypoint models.
Object detection, segmentation, classification, gaze detection, and keypoint models.
CLIP is also supported.
All Roboflow Workflows.
All models hosted on or uploaded to Roboflow.
Foundation models like CLIP, SAM-2, and Florence-2.
Pricing
Pay per model inference.
Pay based on video length and FPS.
Pay per hour of server utilisation.

Due to the optimizations to efficiently batch and utilize the GPU and the higher latency tolerance, the Video Inference API can be up to 100x cheaper for stored (vs realtime streaming) video processing than the image-based Roboflow Hosted Inference API.

Serverless API and Dedicated Deployment Latency

The amount of latency you can expect from the Serverless API depends on the model architecture you are using. The chart below was calculated by running hundreds of API calls against the Serverless API from a cloud server hosted on GCP:

The Roboflow Serverless API scales with use: no matter how many calls you make, the API will scale as needed.

When running individual models trained on or uploaded to Roboflow (i.e. object detection, segmentation models), Dedicated Deployments on GPUs offer significant performance benefits:

Here is the latency of Dedicated Deployments vs. calling the Serverless API:

Dedicated Deployments do not scale automatically, unlike the Serverless API. With that said, you can provision more Dedicated Deployments as needed to suit your business needs.

API Latency by Location

Latency will vary by the location from which you make requests. For a Roboflow Train 3.0 Fast model that we benchmarked, we found the following latencies by location:

  • San Francisco (US West Coast): 250-350ms
  • New York City (US East Coast): 185-350ms
  • Scotland, UK: 400-550ms

These results were calculated using the benchmarking command in the last section.

You can run your own benchmarks on latency from the Roboflow Serverless API using the following commands:

pip install inference

inference benchmark api-speed --api-key YOUR_API_KEY --model_id yard-management-system/4 --host https://detect.roboflow.com --rps=3 --benchmark_requests=200 --warm_up_requests=50

Workflow Latency

When you are using Roboflow Workflows either with the Serverless API or Dedicated Deployments, the latency you can expect will vary depending on the blocks in your Workflow. For example, a Workflow that runs an object detection model and crops results will be faster than one that runs an object detection model, crops results, then classifies each of the crops.

Advantages of Using Roboflow Cloud Hosting

Running your computer vision models in the Roboflow Cloud – whether using our serverless API, our video API, or dedicated CPU/GPUs running Roboflow Inference – gives you tight integration with the rest of the Roboflow ecosystem.

By hosting with Roboflow, you get:

  1. Infrastructure that runs Inference, a high-performance computer vision server trusted with running millions of API calls a month;
  2. Pricing based on use, rather than by compute hour;
  3. Model monitoring which lets you see how your models are performing in production. Model monitoring works at any scale, from 1 to 100 models and beyond;
  4. The ability to run complex, multi-stage Workflows created in the Roboflow Workflows editor without having to manage your own infrastructure, and more.

Roboflow Cloud Hosting Alternatives

If reducing latency is essential and/or you don’t have an active internet connection, you may want to consider on-device deployments that run Roboflow Inference. These are deployments that run models and Workflows offline.

Get Started

Ready to start deploying? Check out our documentation that will get you running with the Roboflow Serverless API, Video API, or a Dedicated Deployment in under five minutes:

Need advice as you plan and set up a business deployment? Contact the Roboflow sales team to speak with a deployment expert and discuss bulk pricing.