WesternArmenianLLM
RVogel101/WesternArmenianLLM
Summary
A project to create a bilingual Western Armenian-English large language model using QLoRA fine-tuning on Qwen 2.5 1.5B. It includes a data pipeline that reads from a centralized MongoDB database, performs text cleaning and language filtering, and prepares training splits. The workflow includes a mandatory audit for Eastern Armenian text leakage, a two-stage training process (pretraining and instruction fine-tuning), and plans for RAG and model serving.