Domain Adaptation of Thai Word Segmentation Models using Stacked Ensemble

Peerat Limkonchotiwat, Wannaphong Phatthiyaphaibun, Raheem Sarwar, Ekapol Chuangsuwanich, Sarana Nutanong


Abstract
Like many Natural Language Processing tasks, Thai word segmentation is domain-dependent. Researchers have been relying on transfer learning to adapt an existing model to a new domain. However, this approach is inapplicable to cases where we can interact with only input and output layers of the models, also known as “black boxes”. We propose a filter-and-refine solution based on the stacked-ensemble learning paradigm to address this black-box limitation. We conducted extensive experimental studies comparing our method against state-of-the-art models and transfer learning. Experimental results show that our proposed solution is an effective domain adaptation method and has a similar performance as the transfer learning method.
Anthology ID:
2020.emnlp-main.315
Volume:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:
November
Year:
2020
Address:
Online
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3841–3847
Language:
URL:
https://aclanthology.org/2020.emnlp-main.315
DOI:
10.18653/v1/2020.emnlp-main.315
Bibkey:
Cite (ACL):
Peerat Limkonchotiwat, Wannaphong Phatthiyaphaibun, Raheem Sarwar, Ekapol Chuangsuwanich, and Sarana Nutanong. 2020. Domain Adaptation of Thai Word Segmentation Models using Stacked Ensemble. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3841–3847, Online. Association for Computational Linguistics.
Cite (Informal):
Domain Adaptation of Thai Word Segmentation Models using Stacked Ensemble (Limkonchotiwat et al., EMNLP 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/paclic-22-ingestion/2020.emnlp-main.315.pdf
Code
 mrpeerat/SEFR_CUT