Armenian-tokenizer

nairabarseghyan/Armenian-tokenizer

Jupyter Notebook Stars: 0 Forks: 0 Language/NLP

Summary

A student project implementing multiple tokenization methods (BPE, WordPiece, SentencePiece, tiktoken) for Armenian language using Wikipedia data. Includes custom tokenizer implementations, trained models, and evaluation notebooks.

Similar Projects