Neural Models for Lemmatization and POS-Tagging of Earlier and Late Egyptian (Supporting Hieroglyphic Input) and Demotic

Aleksi Sahala, Eliese-Sophia Lincke


Abstract
We present updated models for BabyLemma-tizer for lemmatizing and POS-tagging De-motic, Late Egyptian and Earlier Egyptian with a support for using hieroglyphs as an input. In this paper, we also use data that has not been cleaned from breakages. We achieve consistent UPOS tagging accuracy of 94% or higher and an XPOS tagging accuracy of 93% and higher for all languages. For lemmatization, which is challenging in all of our test languages due to extensive ambiguity, we demonstrate accu-racies from 77% up to 92% depending on the language and the input script.
Anthology ID:
2025.alp-1.12
Volume:
Proceedings of the Second Workshop on Ancient Language Processing
Month:
May
Year:
2025
Address:
The Albuquerque Convention Center, Laguna
Editors:
Adam Anderson, Shai Gordin, Bin Li, Yudong Liu, Marco C. Passarotti, Rachele Sprugnoli
Venues:
ALP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
99–104
Language:
URL:
https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2025.alp-1.12/
DOI:
10.18653/v1/2025.alp-1.12
Bibkey:
Cite (ACL):
Aleksi Sahala and Eliese-Sophia Lincke. 2025. Neural Models for Lemmatization and POS-Tagging of Earlier and Late Egyptian (Supporting Hieroglyphic Input) and Demotic. In Proceedings of the Second Workshop on Ancient Language Processing, pages 99–104, The Albuquerque Convention Center, Laguna. Association for Computational Linguistics.
Cite (Informal):
Neural Models for Lemmatization and POS-Tagging of Earlier and Late Egyptian (Supporting Hieroglyphic Input) and Demotic (Sahala & Lincke, ALP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2025.alp-1.12.pdf