Methodological issues regarding the semi-automatic UD treebank creation of under-resourced languages: the case of Pomak
Stella Markantonatou, Nicolaos Th. Constantinides, Vivian Stamou, Vasileios Arampatzakis, Panagiotis G. Krimpas, George Pavlidis
Abstract
Pomak is an endangered oral Slavic language of Thrace/Greece. We present a short description of its interesting morphological and syntactic features in the UD framework. Because the morphological annotation of the treebank takes advantage of existing resources, it requires a different methodological approach from the one adopted for syntactic annotation that has started from scratch. It also requires the option of obtaining morphological predictions/evaluation separately from the syntactic ones with state-of-the-art NLP tools. Active annotation is applied in various settings in order to identify the best model that would facilitate the ongoing syntactic annotation.- Anthology ID:
- 2023.udw-1.4
- Volume:
- Proceedings of the Sixth Workshop on Universal Dependencies (UDW, GURT/SyntaxFest 2023)
- Month:
- March
- Year:
- 2023
- Address:
- Washington, D.C.
- Venues:
- UDW | SyntaxFest
- SIG:
- SIGPARSE
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 27–35
- Language:
- URL:
- https://aclanthology.org/2023.udw-1.4
- DOI:
- Cite (ACL):
- Stella Markantonatou, Nicolaos Th. Constantinides, Vivian Stamou, Vasileios Arampatzakis, Panagiotis G. Krimpas, and George Pavlidis. 2023. Methodological issues regarding the semi-automatic UD treebank creation of under-resourced languages: the case of Pomak. In Proceedings of the Sixth Workshop on Universal Dependencies (UDW, GURT/SyntaxFest 2023), pages 27–35, Washington, D.C.. Association for Computational Linguistics.
- Cite (Informal):
- Methodological issues regarding the semi-automatic UD treebank creation of under-resourced languages: the case of Pomak (Markantonatou et al., UDW-SyntaxFest 2023)
- PDF:
- https://preview.aclanthology.org/author-url/2023.udw-1.4.pdf