Tokenization and Morphology in Multilingual Language Models: A Comparative Analysis of mT5 and ByT5

Thao Anh Dang, Limor Raviv, Lukas Galke


Anthology ID:
2025.icnlsp-1.24
Volume:
Proceedings of the 8th International Conference on Natural Language and Speech Processing (ICNLSP-2025)
Month:
August
Year:
2025
Address:
Southern Denmark University, Odense, Denmark
Editors:
Mourad Abbas, Tariq Yousef, Lukas Galke
Venue:
ICNLSP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
242–257
Language:
URL:
https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2025.icnlsp-1.24/
DOI:
Bibkey:
Cite (ACL):
Thao Anh Dang, Limor Raviv, and Lukas Galke. 2025. Tokenization and Morphology in Multilingual Language Models: A Comparative Analysis of mT5 and ByT5. In Proceedings of the 8th International Conference on Natural Language and Speech Processing (ICNLSP-2025), pages 242–257, Southern Denmark University, Odense, Denmark. Association for Computational Linguistics.
Cite (Informal):
Tokenization and Morphology in Multilingual Language Models: A Comparative Analysis of mT5 and ByT5 (Dang et al., ICNLSP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2025.icnlsp-1.24.pdf