Charin Polpanumas


2025

pdf bib
The Thai Universal Dependency Treebank
Panyut Sriwirote | Wei Qi Leong | Charin Polpanumas | Santhawat Thanyawong | William Chandra Tjhi | Wirote Aroonmanakun | Attapol T. Rutherford
Transactions of the Association for Computational Linguistics, Volume 13

Automatic dependency parsing of Thai sentences has been underexplored, as evidenced by the lack of large Thai dependency treebanks with complete dependency structures and the lack of a published evaluation of state-of-the-art models, especially transformer-based parsers. In this work, we addressed these gaps by introducing the Thai Universal Dependency Treebank (TUD), a new Thai treebank consisting of 3,627 trees annotated according to the Universal Dependencies (UD) framework. We then benchmarked 92 dependency parsing models that incorporate pretrained transformers on Thai-PUD and our TUD, achieving state-of-the-art results and shedding light on the optimal model components for Thai dependency parsing. Our error analysis of the models also reveals that polyfunctional words, serial verb construction, and lack of rich morphosyntactic features present main challenges for Thai dependency parsing.

2023

pdf bib
PyThaiNLP: Thai Natural Language Processing in Python
Wannaphong Phatthiyaphaibun | Korakot Chaovavanich | Charin Polpanumas | Arthit Suriyawongkul | Lalita Lowphansirikul | Pattarawat Chormai | Peerat Limkonchotiwat | Thanathip Suntorntip | Can Udomcharoenchaikit
Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)

We present PyThaiNLP, a free and open-source natural language processing (NLP) library for Thai language implemented in Python. It provides a wide range of software, models, and datasets for Thai language. We first provide a brief historical context of tools for Thai language prior to the development of PyThaiNLP. We then outline the functionalities it provided as well as datasets and pre-trained language models. We later summarize its development milestones and discuss our experience during its development. We conclude by demonstrating how industrial and research communities utilize PyThaiNLP in their work. The library is freely available at https://github.com/pythainlp/pythainlp.