These projects are collections of datasets, dictionaries, and tools focused on processing and analyzing the Armenian language.
A repository containing Armenian dictionary data files and a Makefile to compile them into the StarDict format. It ag...
A curated multilingual dataset of Armenian and Armenia-related keywords, names, and geographic terms designed for fil...
A repository containing lists of Armenian words, including a romanization guide and word lists in various formats (CS...
A digitized version of the classic Bararan English-Armenian dictionary containing 27,001 entries. The data was conver...
A repository containing Armenian (Western) vocabulary data for the Vocably language-learning app. The data is primari...
A repository containing Armenian vocabulary data (words and translations) automatically generated and updated for the...
A curated list of 316 Armenian stopwords for NLP text preprocessing, provided as a JSON file with usage examples for ...
A dataset of 30,000 Armenian news articles scraped from websites, categorized into six topics (Army, Political, Econo...
A translated version of the DailyDialog dataset into Eastern Armenian, formatted as sequential sentence pairs (input/...
A dataset containing 100,000 Armenian sentences, formatted as a CSV or text file, intended for training and evaluatin...
A project for creating Armenian OCR datasets by scraping Armenian Wiktionary, processing words into lowercase/upperca...