Turkish Treebanking: Unifying and Constructing Efforts

Utku Türk, Furkan Atmaca, Şaziye Betül Özateş, Abdullatif Köksal, Balkiz Ozturk Basaran, Tunga Gungor, Arzucan Özgür

[How to correct problems with metadata yourself]


Abstract
In this paper, we present the current version of two different treebanks, the re-annotation of the Turkish PUD Treebank and the first annotation of the Turkish National Corpus Universal Dependency (henceforth TNC-UD). The annotation of both treebanks, the Turkish PUD Treebank and TNC-UD, was carried out based on the decisions concerning linguistic adequacy of re-annotation of the Turkish IMST-UD Treebank (Türk et. al., forthcoming). Both of the treebanks were annotated with the same annotation process and morphological and syntactic analyses. The TNC-UD is planned to have 10,000 sentences. In this paper, we will present the first 500 sentences along with the annotation PUD Treebank. Moreover, this paper also offers the parsing results of a graph-based neural parser on the previous and re-annotated PUD, as well as the TNC-UD. In light of the comparisons, even though we observe a slight decrease in the attachment scores of the Turkish PUD treebank, we demonstrate that the annotation of the TNC-UD improves the parsing accuracy of Turkish. In addition to the treebanks, we have also constructed a custom annotation software with advanced filtering and morphological editing options. Both the treebanks, including a full edit-history and the annotation guidelines, and the custom software are publicly available under an open license online.
Anthology ID:
W19-4019
Volume:
Proceedings of the 13th Linguistic Annotation Workshop
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Annemarie Friedrich, Deniz Zeyrek, Jet Hoek
Venue:
LAW
SIG:
SIGANN
Publisher:
Association for Computational Linguistics
Note:
Pages:
166–177
Language:
URL:
https://aclanthology.org/W19-4019
DOI:
10.18653/v1/W19-4019
Bibkey:
Cite (ACL):
Utku Türk, Furkan Atmaca, Şaziye Betül Özateş, Abdullatif Köksal, Balkiz Ozturk Basaran, Tunga Gungor, and Arzucan Özgür. 2019. Turkish Treebanking: Unifying and Constructing Efforts. In Proceedings of the 13th Linguistic Annotation Workshop, pages 166–177, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Turkish Treebanking: Unifying and Constructing Efforts (Türk et al., LAW 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/teach-a-man-to-fish/W19-4019.pdf