You can deploy any Workflow built with Roboflow in three ways:
All of these options are backed by Roboflow Inference, our high-performance, open source computer vision inference server.
Enterprises of all sizes – from small businesses to Fortune 100 companies – use Inference to run, manage, and scale vision deployments. Tens of millions of API calls are processed every month in the Roboflow Cloud
Running your computer vision models in the Roboflow Cloud – whether using our serverless API, our video API, or dedicated CPU/GPUs running Roboflow Inference – gives you tight integration with the rest of the Roboflow ecosystem.
By hosting with Roboflow, you get:
Due to the optimizations to efficiently batch and utilize the GPU and the higher latency tolerance, the Video Inference API can be up to 100x cheaper for stored (vs realtime streaming) video processing than the image-based Roboflow Hosted Inference API.
The amount of latency you can expect from the Serverless API depends on the model architecture you are using. The chart below was calculated by running hundreds of API calls against the Serverless API from a cloud server hosted on GCP:
The Roboflow Serverless API scales with use: no matter how many calls you make, the API will scale as needed.
When running individual models trained on or uploaded to Roboflow (i.e. object detection, segmentation models), Dedicated Deployments on GPUs offer significant performance benefits:
Here is the latency of Dedicated Deployments vs. calling the Serverless API:
Dedicated Deployments do not scale automatically, unlike the Serverless API. With that said, you can provision more Dedicated Deployments as needed to suit your business needs.
Latency will vary by the location from which you make requests. For a Roboflow Train 3.0 Fast model that we benchmarked, we found the following latencies by location:
These results were calculated using the benchmarking command in the last section.
You can run your own benchmarks on latency from the Roboflow Serverless API using the following commands:
pip install inference
inference benchmark api-speed --api-key YOUR_API_KEY --model_id yard-management-system/4 --host https://detect.roboflow.com --rps=3 --benchmark_requests=200 --warm_up_requests=50
When you are using Roboflow Workflows either with the Serverless API or Dedicated Deployments, the latency you can expect will vary depending on the blocks in your Workflow. For example, a Workflow that runs an object detection model and crops results will be faster than one that runs an object detection model, crops results, then classifies each of the crops.
Running your computer vision models in the Roboflow Cloud – whether using our serverless API, our video API, or dedicated CPU/GPUs running Roboflow Inference – gives you tight integration with the rest of the Roboflow ecosystem.
By hosting with Roboflow, you get:
If reducing latency is essential and/or you don’t have an active internet connection, you may want to consider on-device deployments that run Roboflow Inference. These are deployments that run models and Workflows offline.
Ready to start deploying? Check out our documentation that will get you running with the Roboflow Serverless API, Video API, or a Dedicated Deployment in under five minutes:
Need advice as you plan and set up a business deployment? Contact the Roboflow sales team to speak with a deployment expert and discuss bulk pricing.