DeepStream is NVIDIA's platform for building highly optimized video processing pipelines accelerated by NVIDIA's hardware, taking full advantage of TensorRT for accelerated inference and CUDA for parallel processing. It targets many of the same business problems as Inference, including monitoring security cameras, smart cities, and industrial IoT.
DeepStream has a reputation for being difficult to use with a steep learning curve. It requires familiarity with NVIDIA tooling and while it is highly configurable, it's also highly complex. It's focused on video processing, without deep integrations with other tooling. DeepStream is not open source; ensure that the license issuitable for your project.
Choose DeepStream if: you're an expert willing to invest a lot of time and effort into optimizing a single project and high throughput is your primary objective.
If you're deeply engrained in the Tensorflow ecosystem and want to deploy avariety of Tensorflow models in different modalities like NLP, recommender systems, and audio in addition to CV models, Tensorflow Serving may be a good choice.
It can be complex to setup and maintain and lacks features many users would consider table stakes (like pre- and post-processing which in many cases will need to be custom coded). Like several of the other servers listed here, it lacks depth in vision-specific functionality.
Choose Tensorflow Serving if: the Tensorflow ecosystem is very important to you and you're willing to put in the legwork to take advantage of its advanced feature set.
Inference turns any computer or edge device into a command center for your computer vision projects.