Contextual Unsupervised Clustering of Signs for Ancient Writing Systems

Michele Corazza, Fabio Tamburini, Miguel Valério, Silvia Ferrara


Abstract
The application of machine learning techniques to ancient writing systems is a relatively new idea, and it poses interesting challenges for researchers. One particularly challenging aspect is the scarcity of data for these scripts, which contrasts with the large amounts of data usually available when applying neural models to computational linguistics and other fields. For this reason, any method that attempts to work on ancient scripts needs to be ad-hoc and consider paleographic aspects, in addition to computational ones. Considering the peculiar characteristics of the script that we used is therefore be a crucial part of our work, as any solution needs to consider the particular nature of the writing system that it is applied to. In this work we propose a preliminary evaluation of a novel unsupervised clustering method on Cypro-Greek syllabary, a writing system from Cyprus. This evaluation shows that our method improves clustering performance using information about the attested sequences of signs in combination with an unsupervised model for images, with the future goal of applying the methodology to undeciphered writing systems from a related and typologically similar script.
Anthology ID:
2022.lt4hala-1.12
Volume:
Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Rachele Sprugnoli, Marco Passarotti
Venue:
LT4HALA
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
84–93
Language:
URL:
https://aclanthology.org/2022.lt4hala-1.12
DOI:
Bibkey:
Cite (ACL):
Michele Corazza, Fabio Tamburini, Miguel Valério, and Silvia Ferrara. 2022. Contextual Unsupervised Clustering of Signs for Ancient Writing Systems. In Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages, pages 84–93, Marseille, France. European Language Resources Association.
Cite (Informal):
Contextual Unsupervised Clustering of Signs for Ancient Writing Systems (Corazza et al., LT4HALA 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-2024-clasp/2022.lt4hala-1.12.pdf