The Thai Universal Dependency Treebank

Panyut Sriwirote, Wei Qi Leong, Charin Polpanumas, Santhawat Thanyawong, William Chandra Tjhi, Wirote Aroonmanakun, Attapol T. Rutherford


Abstract
Automatic dependency parsing of Thai sentences has been underexplored, as evidenced by the lack of large Thai dependency treebanks with complete dependency structures and the lack of a published evaluation of state-of-the-art models, especially transformer-based parsers. In this work, we addressed these gaps by introducing the Thai Universal Dependency Treebank (TUD), a new Thai treebank consisting of 3,627 trees annotated according to the Universal Dependencies (UD) framework. We then benchmarked 92 dependency parsing models that incorporate pretrained transformers on Thai-PUD and our TUD, achieving state-of-the-art results and shedding light on the optimal model components for Thai dependency parsing. Our error analysis of the models also reveals that polyfunctional words, serial verb construction, and lack of rich morphosyntactic features present main challenges for Thai dependency parsing.
Anthology ID:
2025.tacl-1.18
Volume:
Transactions of the Association for Computational Linguistics, Volume 13
Month:
Year:
2025
Address:
Cambridge, MA
Venue:
TACL
SIG:
Publisher:
MIT Press
Note:
Pages:
376–391
Language:
URL:
https://preview.aclanthology.org/corrections-2025-07/2025.tacl-1.18/
DOI:
10.1162/tacl_a_00745
Bibkey:
Cite (ACL):
Panyut Sriwirote, Wei Qi Leong, Charin Polpanumas, Santhawat Thanyawong, William Chandra Tjhi, Wirote Aroonmanakun, and Attapol T. Rutherford. 2025. The Thai Universal Dependency Treebank. Transactions of the Association for Computational Linguistics, 13:376–391.
Cite (Informal):
The Thai Universal Dependency Treebank (Sriwirote et al., TACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/corrections-2025-07/2025.tacl-1.18.pdf