ArmBench-LLM

Metricam/ArmBench-LLM

A comprehensive Armenian model evaluation framework for benchmarking large language models (LLMs).

Python Stars: 6 Forks: 0 ML/AI

Summary

ArmBench-LLM is a specialized evaluation framework for benchmarking large language models on Armenian language tasks. It supports multiple Armenian-specific datasets (language, literature, history) and the Armenian version of MMLU-Pro, offering both vLLM-optimized and Hugging Face inference. The project includes configuration management, result generation, and submission to a public leaderboard.

Similar Projects