ArmBench-LLM
Metricam/ArmBench-LLM
A comprehensive Armenian model evaluation framework for benchmarking large language models (LLMs).
Summary
ArmBench-LLM is a specialized evaluation framework for benchmarking large language models on Armenian language tasks. It supports multiple Armenian-specific datasets (language, literature, history) and the Armenian version of MMLU-Pro, offering both vLLM-optimized and Hugging Face inference. The project includes configuration management, result generation, and submission to a public leaderboard.