Pengxiu Lu
2025
Lemmatization of Cuneiform Languages Using the ByT5 Model
Pengxiu Lu
|
Yonglong Huang
|
Jing Xu
|
Minxuan Feng
|
Chao Xu
Proceedings of the Second Workshop on Ancient Language Processing
Lemmatization of cuneiform languages presents a unique challenge due to their complex writing system, which combines syllabic and logographic elements. In this study, we investigate the effectiveness of the ByT5 model in addressing this challenge by developing and evaluating a ByT5-based lemmatization system. Experimental results demonstrate that ByT5 outperforms mT5 in this task, achieving an accuracy of 80.55% on raw lemmas and 82.59% on generalized lemmas, where sense numbers are removed. These findings highlight the potential of ByT5 for lemmatizing cuneiform languages and provide useful insights for future work on ancient text lemmatization.