EvaCun 2025 Shared Task: Lemmatization and Token Prediction in Akkadian and Sumerian using LLMs

Shai Gordin; Aleksi Sahala; Shahar Spencer; Stav Klein

doi:10.18653/v1/2025.alp-1.33

EvaCun 2025 Shared Task: Lemmatization and Token Prediction in Akkadian and Sumerian using LLMs

Shai Gordin, Aleksi Sahala, Shahar Spencer, Stav Klein

Abstract

The EvaCun 2025 Shared Task, organized as part of ALP 2025 workshop and co-located with NAACL 2025, explores how Large Language Models (LLMs) and transformer-based models can be used to improve lemmatization and token prediction tasks for low-resource ancient cuneiform texts. This year our datasets focused on the best attested ancient Near Eastern languages written in cuneiform, namely, Akkadian and Sumerian texts. However, we utilized the availability of datasets never before used on scale in NLP tasks, primarily first millennium literature (i.e. “Canonical”) provided by the Electronic Babylonian Library (eBL), and Old Babylonian letters and archival texts, provided by Archibab. We aim to encourage the development of new computational methods to better analyze and reconstruct cuneiform inscriptions, pushing NLP forward for ancient and low-resource languages. Three teams competed for the lemmatization subtask and one for the token prediction subtask. Each subtask was evaluated alongside a baseline model, provided by the organizers.

Anthology ID:: 2025.alp-1.33
Volume:: Proceedings of the Second Workshop on Ancient Language Processing
Month:: May
Year:: 2025
Address:: The Albuquerque Convention Center, Laguna
Editors:: Adam Anderson, Shai Gordin, Bin Li, Yudong Liu, Marco C. Passarotti, Rachele Sprugnoli
Venues:: ALP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 242–250
Language:
URL:: https://preview.aclanthology.org/corrections-2025-06/2025.alp-1.33/
DOI:: 10.18653/v1/2025.alp-1.33
Bibkey:
Cite (ACL):: Shai Gordin, Aleksi Sahala, Shahar Spencer, and Stav Klein. 2025. EvaCun 2025 Shared Task: Lemmatization and Token Prediction in Akkadian and Sumerian using LLMs. In Proceedings of the Second Workshop on Ancient Language Processing, pages 242–250, The Albuquerque Convention Center, Laguna. Association for Computational Linguistics.
Cite (Informal):: EvaCun 2025 Shared Task: Lemmatization and Token Prediction in Akkadian and Sumerian using LLMs (Gordin et al., ALP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/corrections-2025-06/2025.alp-1.33.pdf

PDF Cite Search Fix data