Low-resource-Armenian-NLP
levongevorgian/Low-resource-Armenian-NLP
Summary
A research project investigating and improving tokenization efficiency for the low-resource Armenian language. It involves analyzing existing tokenizer performance on Armenian, training Armenian-specific tokenizers, grafting new vocabulary into a Qwen2.5-0.5B model, and evaluating recovery via LoRA fine-tuning. The work is structured as a multi-goal research study with code, notebooks, and a final report.