Assignment of account type to proto-cuneiform economic texts with Multi-Class Support Vector Machines

Piotr Zadworny, Shai Gordin


Abstract
We investigate the use of machine learning for classifying proto-cuneiform economic texts (3,500-3,000 BCE), leveraging Multi-Class Support Vector Machines (MSVM) to assign text type based on content. Proto-cuneiform presents unique challenges, as it does not en-code spoken language, yet is transcribed into linear formats that obscure original structural elements. We address this by reformatting tran-scriptions, experimenting with different tok-enization strategies, and optimizing feature ex-traction. Our workflow achieves high label-ing reliability and enables significant metadata enrichment. In addition to improving digital corpus organization, our approach opens the chance to identify economic institutions in an-cient Mesopotamian archives, providing a new tool for Assyriological research.
Anthology ID:
2025.alp-1.3
Volume:
Proceedings of the Second Workshop on Ancient Language Processing
Month:
May
Year:
2025
Address:
The Albuquerque Convention Center, Laguna
Editors:
Adam Anderson, Shai Gordin, Bin Li, Yudong Liu, Marco C. Passarotti, Rachele Sprugnoli
Venues:
ALP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
22–30
Language:
URL:
https://preview.aclanthology.org/Author-page-Marten-During-lu/2025.alp-1.3/
DOI:
Bibkey:
Cite (ACL):
Piotr Zadworny and Shai Gordin. 2025. Assignment of account type to proto-cuneiform economic texts with Multi-Class Support Vector Machines. In Proceedings of the Second Workshop on Ancient Language Processing, pages 22–30, The Albuquerque Convention Center, Laguna. Association for Computational Linguistics.
Cite (Informal):
Assignment of account type to proto-cuneiform economic texts with Multi-Class Support Vector Machines (Zadworny & Gordin, ALP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/Author-page-Marten-During-lu/2025.alp-1.3.pdf