Piotr Zadworny
2025
Assignment of account type to proto-cuneiform economic texts with Multi-Class Support Vector Machines
Piotr Zadworny
|
Shai Gordin
Proceedings of the Second Workshop on Ancient Language Processing
We investigate the use of machine learning for classifying proto-cuneiform economic texts (3,500-3,000 BCE), leveraging Multi-Class Support Vector Machines (MSVM) to assign text type based on content. Proto-cuneiform presents unique challenges, as it does not en-code spoken language, yet is transcribed into linear formats that obscure original structural elements. We address this by reformatting tran-scriptions, experimenting with different tok-enization strategies, and optimizing feature ex-traction. Our workflow achieves high label-ing reliability and enables significant metadata enrichment. In addition to improving digital corpus organization, our approach opens the chance to identify economic institutions in an-cient Mesopotamian archives, providing a new tool for Assyriological research.