DeepStream is NVIDIA's platform for building highly optimized video processing pipelines accelerated by NVIDIA's hardware, taking full advantage of TensorRT for accelerated inference and CUDA for parallel processing. It targets many of the same business problems as Inference, including monitoring security cameras, smart cities, and industrial IoT.
DeepStream has a reputation for being difficult to use with a steep learning curve. It requires familiarity with NVIDIA tooling and while it is highly configurable, it's also highly complex. It's focused on video processing, without deep integrations with other tooling. DeepStream is not open source; ensure that the license issuitable for your project.
Choose DeepStream if: you're an expert willing to invest a lot of time and effort into optimizing a single project and high throughput is your primary objective.
In the olden days, most people rolled their own servers to expose their ML models to client applications. In fact, Roboflow Inference's HTTP interface and REST API are built on FastAPI.
In this day and age, it's certainly still possible to start from scratch, but you'll be reinventing the wheel and will run into a lot of footguns others have already solved along the way. It's usually better and faster to use one of the existing ML-focused servers.
Choose FastAPI or Flask if: your main goal is learning the intricacies ofmaking an inference server.
Inference turns any computer or edge device into a command center for your computer vision projects.