Tesseract is a highly popular OCR engine and project, now primarily developed open-source.
Originally developed by Hewlett Packard (HP) between 1984 and 1994, it was created as a better alternative to other commercial OCR engines at the time which “failed miserably”. In 1995, it was in the top three OCR engines in terms of character accuracy. In 2006, Tesseract development was sponsored by Google until 2019. Version 4 of Tesseract added a machine learning-based technique called LSTM. Version 5 was released in 2021.
Initially supporting only English, it now supports over 100 languages natively, while having the ability to be trained to recognize more.