haratch-ocr

v4nn4/haratch-ocr

Armenian OCR on Haratch (Յառաջ).

Python Stars: 0 Forks: 0 License: MIT Tools

Summary

A specialized OCR tool for digitizing historical Armenian newspapers from the Haratch archive. It automates downloading PDFs, performs layout detection with DocLayout-YOLO, and extracts Armenian text using a custom Tesseract model (hye-calfa-n). Includes optional AI translation features.

Similar Projects