The PyTorch ecosystem's equivalent of Tensorflow Serving is TorchServe. It's optimized for serving PyTorch models across several domains including vision, NLP, tabular data, and audio.
Like Tensorflow Serving, it is designed for large-scale cloud deployments and can require custom configuration for things like pre- and post-processing and deploying multiple models. Because of its wide mandate it lacks many vision-specific features (like video streaming).
Chose TorchServe if: you're looking for a way to scale and customize the deployment of your PyTorch models and don't need vision-specific functionality.
LitServe is a lightweight and customizable inference server focused on serving models with minimal overhead. It is fairly minimalistic but flexible and self-contained.
Like Triton, LitServe is task-agnostic, meaning it is designed to balance the needs of vision models with NLP, audio, and tabular models. This means it's not as feature-rich for computer vision applications (for example, it doesn't have any built-in features for streaming video). It is also highly focused on model serving without an abstraction layer like Workflows (offered by Roboflow Inference) for model chaining and integrations with other tools.
Choose LitServe if: you are working on general-purpose machine learningtasks and were previously considering rolling your own server but wanta more featureful starting point.
Inference turns any computer or edge device into a command center for your computer vision projects.