armenian-corpus-core

RVogel101/armenian-corpus-core

Python Stars: 0 Forks: 0 Language/NLP

Summary

A Python package for collecting, processing, and normalizing a Western Armenian language corpus. It provides a full ETL pipeline for scraping web content, storing it in MongoDB, and performing validation, deduplication, dialect tagging, and frequency analysis.

Similar Projects