epg-embedding-benchmark

s1mb1o/epg-embedding-benchmark

Evaluating sentence embedding models for cross-lingual TV program guide matching across English, Russian, and Armenian

Jupyter Notebook Stars: 1 Forks: 0 License: MIT ML/AI

Summary

This repository presents a benchmark for evaluating multilingual sentence embedding models on a real-world task: cross-lingual matching of TV program guide (EPG) entries across English, Russian, and Armenian. It focuses on the practical challenge of aligning content in a low-resource language (Armenian) within a recommendation system. The project includes a detailed dataset analysis, evaluation of multiple models (including E5, LaBSE, and OpenAI embeddings), and key findings on the divergence between alignment and retrieval performance.

Similar Projects