In the olden days, most people rolled their own servers to expose their ML models to client applications. In fact, Roboflow Inference's HTTP interface and REST API are built on FastAPI.
In this day and age, it's certainly still possible to start from scratch, but you'll be reinventing the wheel and will run into a lot of footguns others have already solved along the way. It's usually better and faster to use one of the existing ML-focused servers.
Choose FastAPI or Flask if: your main goal is learning the intricacies ofmaking an inference server.
The PyTorch ecosystem's equivalent of Tensorflow Serving is TorchServe. It's optimized for serving PyTorch models across several domains including vision, NLP, tabular data, and audio.
Like Tensorflow Serving, it is designed for large-scale cloud deployments and can require custom configuration for things like pre- and post-processing and deploying multiple models. Because of its wide mandate it lacks many vision-specific features (like video streaming).
Chose TorchServe if: you're looking for a way to scale and customize the deployment of your PyTorch models and don't need vision-specific functionality.
Inference turns any computer or edge device into a command center for your computer vision projects.