Roboflow Managed Compute

You can deploy any Workflow built with Roboflow in three ways:

Run a Workflow or a model on an image.
Run a Workflow or a model on stored video files.
Run a Workflow or model on images or video streams using dedicated infrastructure.

All of these options are backed by Roboflow Inference, our high-performance, open source computer vision inference server. These options comprise Roboflow Managed Compute's offerings.

Enterprises of all sizes – from small businesses to Fortune 100 companies – use Inference to run, manage, and scale vision deployments. Tens of millions of API calls are processed every month on Roboflow's Managed Compute solutions.

Advantages of Using Roboflow Managed Compute

Running your computer vision models with a Roboflow Managed Compute solution – whether using our Serverless Hosted API, our Video API, or Dedicated CPU/GPUs running Roboflow Inference – gives you tight integration with the rest of the Roboflow ecosystem.

By hosting with Roboflow, you get:

  1. Infrastructure that runs Inference, a high-performance computer vision server trusted with running millions of API calls a month;
  2. Pricing based on use, rather than by compute hour;
  3. Model monitoring which lets you see how your models are performing in production. Model monitoring works at any scale, from 1 to 100 models and beyond;
  4. The ability to run complex, multi-stage Workflows created in the Roboflow Workflows editor without having to manage your own infrastructure, and more.

Comparing Roboflow Managed Compute Options

 
Serverless Hosted API
Hosted Video API
Dedicated Deployments
How it Works
Send an image to the Roboflow API, get a result with predictions.
You can call Workflows or individual models.
Send a video, then get results when the video is processed. You can apply one or more models to frames in a video. Workflows are not supported.
Provision dedicated CPUs and GPUs for your workloads. You can call Workflows or individual models.
Best Suited For
Processing images at any scale.
Processing stored videos at any scale.
Run foundation models like SAM-2, CLIP, PaliGemma-2, and Florence-2.
Scales
Automatically as you make more API calls.
Automatically as you send more videos.
Capable of processing up to 40 RPS at < 250ms latency.
Supported Models
All Roboflow Workflows. Object detection, segmentation, classification, and keypoint models.
Object detection, segmentation, classification, gaze detection, and keypoint models.
CLIP is also supported.
All Roboflow Workflows.
All models hosted on or uploaded to Roboflow.
Foundation models like CLIP, SAM-2, and Florence-2.
Pricing
Pay per model inference.
Pay based on video length and FPS.
Pay per hour of server utilisation.

Serverless Hosted API and Dedicated Deployment Latency

The amount of latency you can expect from the Serverless Hosted API depends on the model architecture you are using. The chart below was calculated by running hundreds of API calls against the Serverless Hosted API from a cloud server hosted on GCP:

The Roboflow Serverless Hosted API scales with use: no matter how many calls you make, the API will scale as needed.

When running individual models trained on or uploaded to Roboflow (i.e. object detection, segmentation models), Dedicated Deployments on GPUs offer significant performance benefits:

Here is the latency of Dedicated Deployments vs. calling the Serverless Hosted API:

Dedicated Deployments do not scale automatically, unlike the Serverless Hosted API. With that said, you can provision more Dedicated Deployments as needed to suit your business needs.

Serverless Hosted API Latency by Location

Latency will vary by the location from which you make requests. For a Roboflow Train 3.0 Fast model that we benchmarked, we found the following latencies by location: when querying the Roboflow Serverless API:

  • San Francisco (US West Coast): 250-350ms
  • New York City (US East Coast): 185-350ms
  • Scotland, UK: 400-550ms

These results were calculated using the benchmarking command in the last section.

You can run your own benchmarks on latency from the Roboflow Serverless API using the following commands:


pip install inference

inference benchmark api-speed --api-key YOUR_API_KEY --model_id yard-management-system/4 --host https://detect.roboflow.com --rps=3 --benchmark_requests=200 --warm_up_requests=50

You can benchmark a model running on a Dedicated Deployment by replacing the https://detect.roboflow.com endpoint with your Dedicated Deployment API URL.

Workflow Latency

When you are using Roboflow Workflows either with the Serverless API or Dedicated Deployments, the latency you can expect will vary depending on the blocks in your Workflow. For example, a Workflow that runs an object detection model and crops results will be faster than one that runs an object detection model, crops results, then classifies each of the crops.

You can benchmark a Workflow using the following command:


pip install inference

inference benchmark api-speed --workflow-id custom-workflow-11 --host https://detect.roboflow.com --api-key YOUR_KEY --benchmark_requests 1000

You can benchmark a Workflow running on a Dedicated Deployment by replacing the https://detect.roboflow.com endpoint with your Dedicated Deployment API URL.

Roboflow Cloud Hosting Alternatives

If reducing latency is essential and/or you don’t have an active internet connection, you may want to consider on-device deployments that run Roboflow Inference. These are deployments that run models and Workflows offline.

Get Started

Ready to start deploying? Check out our documentation that will get you running with the Roboflow Serverless API, Video API, or a Dedicated Deployment in under five minutes:

Need advice as you plan and set up a business deployment? Contact the Roboflow sales team to speak with a deployment expert and discuss bulk pricing.