The PyTorch ecosystem's equivalent of Tensorflow Serving is TorchServe. It's optimized for serving PyTorch models across several domains including vision, NLP, tabular data, and audio.
Like Tensorflow Serving, it is designed for large-scale cloud deployments and can require custom configuration for things like pre- and post-processing and deploying multiple models. Because of its wide mandate it lacks many vision-specific features (like video streaming).
Chose TorchServe if: you're looking for a way to scale and customize the deployment of your PyTorch models and don't need vision-specific functionality.
Triton is a powerhouse tool for machine learning experts to deploy ML models at scale. Its primary focus is on extremely optimized pipelines that run efficiently on NVIDIA hardware. It can be tough to use, tradingoff simplicity and a quick development cycle for raw speed and isgeared towards expert users. It can chain models together, but doingso is a rigid and manual process.
Inference turns any computer or edge device into a command center for your computer vision projects.