Armenian Language Data

These projects are collections of datasets, dictionaries, and tools focused on processing and analyzing the Armenian language.

11 projects

A repository containing Armenian dictionary data files and a Makefile to compile them into the StarDict format. It ag...

A curated multilingual dataset of Armenian and Armenia-related keywords, names, and geographic terms designed for fil...

A repository containing lists of Armenian words, including a romanization guide and word lists in various formats (CS...

A digitized version of the classic Bararan English-Armenian dictionary containing 27,001 entries. The data was conver...

A repository containing Armenian (Western) vocabulary data for the Vocably language-learning app. The data is primari...

A repository containing Armenian vocabulary data (words and translations) automatically generated and updated for the...

A curated list of 316 Armenian stopwords for NLP text preprocessing, provided as a JSON file with usage examples for ...

A dataset of 30,000 Armenian news articles scraped from websites, categorized into six topics (Army, Political, Econo...

A translated version of the DailyDialog dataset into Eastern Armenian, formatted as sequential sentence pairs (input/...

A dataset containing 100,000 Armenian sentences, formatted as a CSV or text file, intended for training and evaluatin...

A project for creating Armenian OCR datasets by scraping Armenian Wiktionary, processing words into lowercase/upperca...