Surya is a Python package designed for OCR on document layout analysis. It features multilingual support and layout identification of elements such as tables, images, and headers. It is open-sourced on GitHub with over 5,000 stars.
In benchmarks conducted by and published to the repository, it outperforms Tesseract on inference time and accuracy, as well as showing proficiency in layout recognition and analysis.