Sinhala Dependency Treebank (STB)

Chamila Liyanage, Kengatharaiyer Sarveswaran, Thilini Nadungodage, Randil Pushpananda


Abstract
This paper reports the development of the first dependency treebank for the Sinhala language (STB). Sinhala, which is morphologically rich, is a low-resource language with few linguistic and computational resources available publicly. This treebank consists of 100 sentences taken from a large contemporary written text corpus. These sentences were annotated manually according to the Universal Dependencies framework. In this paper, apart from elaborating on the approach that has been followed to create the treebank, we have also discussed some interesting syntactic constructions found in the corpus and how we have handled them using the current Universal Dependencies specification.
Anthology ID:
2023.udw-1.3
Volume:
Proceedings of the Sixth Workshop on Universal Dependencies (UDW, GURT/SyntaxFest 2023)
Month:
March
Year:
2023
Address:
Washington, D.C.
Editors:
Loïc Grobol, Francis Tyers
Venues:
UDW | SyntaxFest
SIG:
SIGPARSE
Publisher:
Association for Computational Linguistics
Note:
Pages:
17–26
Language:
URL:
https://aclanthology.org/2023.udw-1.3
DOI:
Bibkey:
Cite (ACL):
Chamila Liyanage, Kengatharaiyer Sarveswaran, Thilini Nadungodage, and Randil Pushpananda. 2023. Sinhala Dependency Treebank (STB). In Proceedings of the Sixth Workshop on Universal Dependencies (UDW, GURT/SyntaxFest 2023), pages 17–26, Washington, D.C.. Association for Computational Linguistics.
Cite (Informal):
Sinhala Dependency Treebank (STB) (Liyanage et al., UDW-SyntaxFest 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2023.udw-1.3.pdf
Video:
 https://preview.aclanthology.org/emnlp-22-attachments/2023.udw-1.3.mov