Armenian Language Data

These projects are collections of datasets, dictionaries, and tools focused on processing and analyzing the Armenian language.

11 projects
noch-armenian-dictionary
norayr/noch-armenian-dictionary

A repository containing Armenian dictionary data files and a Makefile to compile them into the StarDict format. It ag...

Makefile Stars: 11
armenian-keywords
opendataam/armenian-keywords

A curated multilingual dataset of Armenian and Armenia-related keywords, names, and geographic terms designed for fil...

Stars: 3
armenian-words
shay-ellison/armenian-words

A repository containing lists of Armenian words, including a romanization guide and word lists in various formats (CS...

Python Stars: 2
baratian-dictionary
bararan-hay/baratian-dictionary

A digitized version of the classic Bararan English-Armenian dictionary containing 27,001 entries. The data was conver...

Stars: 2
hyw
vocably/hyw

A repository containing Armenian (Western) vocabulary data for the Vocably language-learning app. The data is primari...

Stars: 0
hy
vocably/hy

A repository containing Armenian vocabulary data (words and translations) automatically generated and updated for the...

Stars: 0
-stopwords-hy-
Albert-Ananyan/-stopwords-hy-

A curated list of 316 Armenian stopwords for NLP text preprocessing, provided as a JSON file with usage examples for ...

Stars: 0
Armenian-News-Dataset
erantonyan24/Armenian-News-Dataset

A dataset of 30,000 Armenian news articles scraped from websites, categorized into six topics (Army, Political, Econo...

Stars: 0
daily_dialog_armenian
Evrikia/daily_dialog_armenian

A translated version of the DailyDialog dataset into Eastern Armenian, formatted as sequential sentence pairs (input/...

Stars: 0
arm_sentences_100-000
Evrikia/arm_sentences_100-000

A dataset containing 100,000 Armenian sentences, formatted as a CSV or text file, intended for training and evaluatin...

Stars: 0
Armenian-Words-Lexicon-and-OCR-Dataset
AtecAi/Armenian-Words-Lexicon-and-OCR-Dataset

A project for creating Armenian OCR datasets by scraping Armenian Wiktionary, processing words into lowercase/upperca...

Python Stars: 0