epg-embedding-benchmark

s1mb1o/epg-embedding-benchmark

Evaluating sentence embedding models for cross-lingual TV program guide matching across English, Russian, and Armenian

Jupyter Notebook Stars: 1 Forks: 0 License: MIT ML/AI

multilingual-embeddings sentence-embeddings benchmark recommendation-system low-resource-language information-retrieval jupyter-notebook machine-learning armenian-nlp

Summary

This repository presents a benchmark for evaluating multilingual sentence embedding models on a real-world task: cross-lingual matching of TV program guide (EPG) entries across English, Russian, and Armenian. It focuses on the practical challenge of aligning content in a low-resource language (Armenian) within a recommendation system. The project includes a detailed dataset analysis, evaluation of multiple models (including E5, LaBSE, and OpenAI embeddings), and key findings on the divergence between alignment and retrieval performance.

View on GitHub

More in: Armenian NLP Evaluation Benchmarks →

Similar Projects

ArmBench-TextEmbed

Metric-AI-Lab/ArmBench-TextEmbed

ArmBench-TextEmbed is a specialized Python benchmark for evaluating the performance of text embedding models on the A...

Python Stars: 4

Metricam/ArmBench-LLM

ArmBench-LLM is a specialized evaluation framework for benchmarking large language models on Armenian language tasks....

Python Stars: 6

Exploring-the-Linguistic-Efficiency-of-Large-Language-Models-in-Armenian-Discourse

Anahit-N/Exploring-the-Linguistic-Efficiency-of-Large-Language-Models-in-Armenian-Discourse

A capstone project evaluating GPT-3.5-Turbo's performance on Armenian language tasks, including extractive QA, multip...

Jupyter Notebook Stars: 0

QnarikP/word2vec_arm

A complete pipeline for training Word2Vec embeddings on Armenian text, including data preprocessing, model training w...

Jupyter Notebook Stars: 0

armenian_text_embedding_demo

BagratMinasyan/armenian_text_embedding_demo

A demonstration repository for a pre-trained Armenian text embedding model, showcasing applications in text classific...

Jupyter Notebook Stars: 0

ArmenianLanguegeAutocomplete

madanela/ArmenianLanguegeAutocomplete

A project exploring Armenian language autocomplete using multiple NLP approaches including Word2Vec, LSTM, BERT trans...

Jupyter Notebook Stars: 2