DIACU: A dataset for the DIAchronic analysis of Church Slavonic

Maria Cassese, Giovanni Puccetti, Marianna Napolitano, Andrea Esuli


Abstract
The Church Slavonic language has evolved over time without being formalized into a precise grammar. Therefore, there is currently no clearly outlined history of this language tracing its evolution. However, in recent years, there has been a greater effort to digitize these resources, partly motivated by increased sensitivity with respect to the need to preserve multilingual knowledge. To exploit them, we propose DIACU (DIAchronic Analysis of Church Slavonic), a comprehensive collection of several existing corpora in Church Slavonic. In this work, we thoroughly describe the collection of this novel dataset and test its effectiveness as a training set for attributing Slavonic texts to specific periods. The dataset and the code of the experiments is available at https://github.com/MariaCassese/DIACU.
Anthology ID:
2025.bsnlp-1.12
Volume:
Proceedings of the 10th Workshop on Slavic Natural Language Processing (Slavic NLP 2025)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Jakub Piskorski, Pavel Přibáň, Preslav Nakov, Roman Yangarber, Michal Marcinczuk
Venues:
BSNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
101–107
Language:
URL:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bsnlp-1.12/
DOI:
Bibkey:
Cite (ACL):
Maria Cassese, Giovanni Puccetti, Marianna Napolitano, and Andrea Esuli. 2025. DIACU: A dataset for the DIAchronic analysis of Church Slavonic. In Proceedings of the 10th Workshop on Slavic Natural Language Processing (Slavic NLP 2025), pages 101–107, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
DIACU: A dataset for the DIAchronic analysis of Church Slavonic (Cassese et al., BSNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bsnlp-1.12.pdf