Eliese-Sophia Lincke
2025
Neural Models for Lemmatization and POS-Tagging of Earlier and Late Egyptian (Supporting Hieroglyphic Input) and Demotic
Aleksi Sahala
|
Eliese-Sophia Lincke
Proceedings of the Second Workshop on Ancient Language Processing
We present updated models for BabyLemma-tizer for lemmatizing and POS-tagging De-motic, Late Egyptian and Earlier Egyptian with a support for using hieroglyphs as an input. In this paper, we also use data that has not been cleaned from breakages. We achieve consistent UPOS tagging accuracy of 94% or higher and an XPOS tagging accuracy of 93% and higher for all languages. For lemmatization, which is challenging in all of our test languages due to extensive ambiguity, we demonstrate accu-racies from 77% up to 92% depending on the language and the input script.
2024
Neural Lemmatization and POS-tagging models for Coptic, Demotic and Earlier Egyptian
Aleksi Sahala
|
Eliese-Sophia Lincke
Proceedings of the 1st Workshop on Machine Learning for Ancient Languages (ML4AL 2024)
We present models for lemmatizing and POS-tagging Earlier Egyptian, Coptic and Demotic to test the performance of our pipeline for the ancient languages of Egypt. Of these languages, Demotic and Egyptian are known to be difficult to annotate due to their high extent of ambiguity. We report lemmatization accuracy of 86%, 91% and 99%, and XPOS-tagging accuracy of 89%, 95% and 98% for Earlier Egyptian, Demotic and Coptic, respectively.