This cluster encompasses a diverse collection of datasets, corpora, and treebanks for the Armenian language, spanning its historical, modern, spoken, and written forms to support computational linguistics, NLP, and digital humanities research.
The PROIEL Treebank is a linguistic dataset containing dependency treebank annotations for texts in ancient Indo-Euro...
A Universal Dependencies treebank for Eastern Armenian, providing manually annotated morphological and syntactic data...
A curated speech corpus of Armenian question-answer dialogues designed for intonation and prosody studies. It contain...
This repository contains cleaned TextGrid transcript files for the ReRooted Archive, a corpus of Syrian Armenian refu...
A Universal Dependencies (UD) treebank for Western Armenian, containing manually annotated morphological and syntacti...
ArmTDP-NER is a manually annotated gold-standard named entity recognition (NER) corpus for Modern Eastern Armenian, c...
A curated list of Armenian language datasets, corpora, models, and digital resources for NLP and computational lingui...
This repository is part of the AI2001 project, specifically for Armenian language linguistic datasets. The README sta...
A Universal Dependencies treebank for Classical Armenian, containing annotated texts from the Gospels and Movses Khor...
A Universal Dependencies treebank for Eastern Armenian, manually annotated from the ArmTDP v2.0 corpus. It includes e...
A dataset repository for a stylometric study on Classical Armenian texts, specifically for authorship attribution of ...
This repository is part of the TITUS-2-0 project, which hosts digital editions of historical texts in various languag...
A Universal Dependencies treebank for Middle Armenian, manually annotated with morphological and syntactic data, deri...