You can deploy any Workflow built with Roboflow in three ways:
All of these options are backed by Roboflow Inference, our high-performance, open source computer vision inference server. These options comprise Roboflow Managed Compute's offerings.
Enterprises of all sizes – from small businesses to Fortune 100 companies – use Inference to run, manage, and scale vision deployments. Tens of millions of API calls are processed every month on Roboflow's Managed Compute solutions.
Running your computer vision models with a Roboflow Managed Compute solution – whether using our Serverless Hosted API, our Video API, or Dedicated CPU/GPUs running Roboflow Inference – gives you tight integration with the rest of the Roboflow ecosystem.
By hosting with Roboflow, you get:
The amount of latency you can expect from the Serverless Hosted API depends on the model architecture you are using. The chart below was calculated by running hundreds of API calls against the Serverless Hosted API from a cloud server hosted on GCP:
The Roboflow Serverless Hosted API scales with use: no matter how many calls you make, the API will scale as needed.
When running individual models trained on or uploaded to Roboflow (i.e. object detection, segmentation models), Dedicated Deployments on GPUs offer significant performance benefits:
Here is the latency of Dedicated Deployments vs. calling the Serverless Hosted API:
Dedicated Deployments do not scale automatically, unlike the Serverless Hosted API. With that said, you can provision more Dedicated Deployments as needed to suit your business needs.
Latency will vary by the location from which you make requests. For a Roboflow Train 3.0 Fast model that we benchmarked, we found the following latencies by location: when querying the Roboflow Serverless API:
These results were calculated using the benchmarking command in the last section.
You can run your own benchmarks on latency from the Roboflow Serverless API using the following commands:
You can benchmark a model running on a Dedicated Deployment by replacing the https://detect.roboflow.com
endpoint with your Dedicated Deployment API URL.
When you are using Roboflow Workflows either with the Serverless API or Dedicated Deployments, the latency you can expect will vary depending on the blocks in your Workflow. For example, a Workflow that runs an object detection model and crops results will be faster than one that runs an object detection model, crops results, then classifies each of the crops.
You can benchmark a Workflow using the following command:
You can benchmark a Workflow running on a Dedicated Deployment by replacing the https://detect.roboflow.com
endpoint with your Dedicated Deployment API URL.
If reducing latency is essential and/or you don’t have an active internet connection, you may want to consider on-device deployments that run Roboflow Inference. These are deployments that run models and Workflows offline.
Ready to start deploying? Check out our documentation that will get you running with the Roboflow Serverless API, Video API, or a Dedicated Deployment in under five minutes:
Need advice as you plan and set up a business deployment? Contact the Roboflow sales team to speak with a deployment expert and discuss bulk pricing.