If you're deeply engrained in the Tensorflow ecosystem and want to deploy avariety of Tensorflow models in different modalities like NLP, recommender systems, and audio in addition to CV models, Tensorflow Serving may be a good choice.
It can be complex to setup and maintain and lacks features many users would consider table stakes (like pre- and post-processing which in many cases will need to be custom coded). Like several of the other servers listed here, it lacks depth in vision-specific functionality.
Choose Tensorflow Serving if: the Tensorflow ecosystem is very important to you and you're willing to put in the legwork to take advantage of its advanced feature set.
LitServe is a lightweight and customizable inference server focused on serving models with minimal overhead. It is fairly minimalistic but flexible and self-contained.
Like Triton, LitServe is task-agnostic, meaning it is designed to balance the needs of vision models with NLP, audio, and tabular models. This means it's not as feature-rich for computer vision applications (for example, it doesn't have any built-in features for streaming video). It is also highly focused on model serving without an abstraction layer like Workflows (offered by Roboflow Inference) for model chaining and integrations with other tools.
Choose LitServe if: you are working on general-purpose machine learningtasks and were previously considering rolling your own server but wanta more featureful starting point.
Inference turns any computer or edge device into a command center for your computer vision projects.