A UD Treebank for Bohairic Coptic

Amir Zeldes, Nina Speransky, Nicholas E. Wagner, Caroline T. Schroeder


Abstract
Despite recent advances in digital resources for other Coptic dialects, especially Sahidic, Bohairic Coptic, the main Coptic dialect for pre-Mamluk, late Byzantine Egypt, and the contemporary language of the Coptic Church, remains critically under-resourced. This paper presents and evaluates the first syntactically annotated corpus of Bohairic Coptic, sampling data from a range of works, including Biblical text, saints’ lives and Christian ascetic writing. We also explore some of the main differences we observe compared to the existing UD treebank of Sahidic Coptic, the classical dialect of the language, and conduct joint and cross-dialect parsing experiments, revealing the unique nature of Bohairic as a related, but distinct variety from the more often studied Sahidic.
Anthology ID:
2025.udw-1.7
Volume:
Proceedings of the Eighth Workshop on Universal Dependencies (UDW, SyntaxFest 2025)
Month:
August
Year:
2025
Address:
Ljubljana, Slovenia
Editors:
Gosse Bomma, Çağrı Çöltekin
Venues:
UDW | WS | SyntaxFest
SIG:
SIGPARSE
Publisher:
Association for Computational Linguistics
Note:
Pages:
59–69
Language:
URL:
https://preview.aclanthology.org/mtsummit-25-ingestion/2025.udw-1.7/
DOI:
Bibkey:
Cite (ACL):
Amir Zeldes, Nina Speransky, Nicholas E. Wagner, and Caroline T. Schroeder. 2025. A UD Treebank for Bohairic Coptic. In Proceedings of the Eighth Workshop on Universal Dependencies (UDW, SyntaxFest 2025), pages 59–69, Ljubljana, Slovenia. Association for Computational Linguistics.
Cite (Informal):
A UD Treebank for Bohairic Coptic (Zeldes et al., UDW-SyntaxFest 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/mtsummit-25-ingestion/2025.udw-1.7.pdf