Armenian-tokenizer
nairabarseghyan/Armenian-tokenizer
Summary
A student project implementing multiple tokenization methods (BPE, WordPiece, SentencePiece, tiktoken) for Armenian language using Wikipedia data. Includes custom tokenizer implementations, trained models, and evaluation notebooks.