In the olden days, most people rolled their own servers to expose their ML models to client applications. In fact, Roboflow Inference's HTTP interface and REST API are built on FastAPI.
In this day and age, it's certainly still possible to start from scratch, but you'll be reinventing the wheel and will run into a lot of footguns others have already solved along the way. It's usually better and faster to use one of the existing ML-focused servers.
Choose FastAPI or Flask if: your main goal is learning the intricacies ofmaking an inference server.
If you're deeply engrained in the Tensorflow ecosystem and want to deploy avariety of Tensorflow models in different modalities like NLP, recommender systems, and audio in addition to CV models, Tensorflow Serving may be a good choice.
It can be complex to setup and maintain and lacks features many users would consider table stakes (like pre- and post-processing which in many cases will need to be custom coded). Like several of the other servers listed here, it lacks depth in vision-specific functionality.
Choose Tensorflow Serving if: the Tensorflow ecosystem is very important to you and you're willing to put in the legwork to take advantage of its advanced feature set.
Inference turns any computer or edge device into a command center for your computer vision projects.