armenian-corpus-core
RVogel101/armenian-corpus-core
Summary
A Python package for collecting, processing, and normalizing a Western Armenian language corpus. It provides a full ETL pipeline for scraping web content, storing it in MongoDB, and performing validation, deduplication, dialect tagging, and frequency analysis.