Armenian-Words-Lexicon-and-OCR-Dataset
AtecAi/Armenian-Words-Lexicon-and-OCR-Dataset
A comprehensive Armenian words dataset and lexicon for OCR and NLP tasks. The repository includes scripts to scrape words from Armenian Wiktionary, modify them (lowercase, uppercase, capitalized), and generate word images for OCR training. Ideal for building Armenian language processing tools.
Summary
A project for creating Armenian OCR datasets by scraping Armenian Wiktionary, processing words into lowercase/uppercase/capitalized forms, and generating synthetic word images with font variations and augmentations for training OCR models.