epg-embedding-benchmark
s1mb1o/epg-embedding-benchmark
Evaluating sentence embedding models for cross-lingual TV program guide matching across English, Russian, and Armenian
Summary
This repository presents a benchmark for evaluating multilingual sentence embedding models on a real-world task: cross-lingual matching of TV program guide (EPG) entries across English, Russian, and Armenian. It focuses on the practical challenge of aligning content in a low-resource language (Armenian) within a recommendation system. The project includes a detailed dataset analysis, evaluation of multiple models (including E5, LaBSE, and OpenAI embeddings), and key findings on the divergence between alignment and retrieval performance.