Armenian-participle-phrase-punctuation
AlbertHakobyan070/Armenian-participle-phrase-punctuation
Code and data pipelines for my bachelor thesis on Armenian NLP
Summary
This repository contains the complete code, data pipelines, and trained models for a bachelor thesis project on Armenian NLP, specifically focusing on automatic punctuation of Armenian participle clauses using sequence labeling. The project employs a knowledge distillation approach where a large language model (Gemini 2.5 Flash) acts as a teacher to generate annotations for 112K sentences, which are then used to train smaller student models (BiLSTM, HyeBERT, mBERT, and an ensemble). The system is evaluated on human-annotated benchmarks, with the ensemble model performing close to the teacher model's performance.