Abstract
This study introduces a pretrained large language model-based annotation methodology of the first dependency treebank in Ottoman Turkish. Our experimental results show that, through iteratively i) pseudo-annotating data using a multilingual BERT-based parsing model, ii) manually correcting the pseudo-annotations, and iii) fine-tuning the parsing model with the corrected annotations, we speed up and simplify the challenging dependency annotation process. The resulting treebank, that will be a part of the Universal Dependencies (UD) project, will facilitate automated analysis of Ottoman Turkish documents, unlocking the linguistic richness embedded in this historical heritage.- Anthology ID:
- 2024.law-1.18
- Volume:
- Proceedings of The 18th Linguistic Annotation Workshop (LAW-XVIII)
- Month:
- March
- Year:
- 2024
- Address:
- St. Julians, Malta
- Editors:
- Sophie Henning, Manfred Stede
- Venues:
- LAW | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 188–196
- Language:
- URL:
- https://aclanthology.org/2024.law-1.18
- DOI:
- Cite (ACL):
- Şaziye Özateş, Tarık Tıraş, Efe Genç, and Esma Bilgin Tasdemir. 2024. Dependency Annotation of Ottoman Turkish with Multilingual BERT. In Proceedings of The 18th Linguistic Annotation Workshop (LAW-XVIII), pages 188–196, St. Julians, Malta. Association for Computational Linguistics.
- Cite (Informal):
- Dependency Annotation of Ottoman Turkish with Multilingual BERT (Özateş et al., LAW-WS 2024)
- PDF:
- https://preview.aclanthology.org/naacl24-info/2024.law-1.18.pdf