haratch-ocr
v4nn4/haratch-ocr
Armenian OCR on Haratch (Յառաջ).
Summary
A specialized OCR tool for digitizing historical Armenian newspapers from the Haratch archive. It automates downloading PDFs, performs layout detection with DocLayout-YOLO, and extracts Armenian text using a custom Tesseract model (hye-calfa-n). Includes optional AI translation features.