WesternArmenianLLM

RVogel101/WesternArmenianLLM

Python Stars: 0 Forks: 0 ML/AI

Summary

A project to create a bilingual Western Armenian-English large language model using QLoRA fine-tuning on Qwen 2.5 1.5B. It includes a data pipeline that reads from a centralized MongoDB database, performs text cleaning and language filtering, and prepares training splits. The workflow includes a mandatory audit for Eastern Armenian text leakage, a two-stage training process (pretraining and instruction fine-tuning), and plans for RAG and model serving.

Similar Projects