Cross-Lingual Data Augmentation For Thai Question-Answering
Parinthapat Pengpun, Can Udomcharoenchaikit, Weerayut Buaphet, Peerat Limkonchotiwat
Abstract
This paper presents an innovative data augmentation framework with data quality control designed to enhance the robustness of Question Answering (QA) models in low-resource languages, particularly Thai. Recognizing the challenges posed by the scarcity and quality of training data, we leverage data augmentation techniques in both monolingual and cross-lingual settings. Our approach augments and enriches the original dataset, thereby increasing its linguistic diversity and robustness. We evaluate the robustness of our framework on Machine Reading Comprehension, and the experimental results illustrate the potential of data augmentation to effectively increase training data and improve model generalization in low-resource language settings, offering a promising direction for the data augmentation manner.- Anthology ID:
- 2023.genbench-1.16
- Volume:
- Proceedings of the 1st GenBench Workshop on (Benchmarking) Generalisation in NLP
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Dieuwke Hupkes, Verna Dankers, Khuyagbaatar Batsuren, Koustuv Sinha, Amirhossein Kazemnejad, Christos Christodoulopoulos, Ryan Cotterell, Elia Bruni
- Venues:
- GenBench | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 193–203
- Language:
- URL:
- https://aclanthology.org/2023.genbench-1.16
- DOI:
- 10.18653/v1/2023.genbench-1.16
- Cite (ACL):
- Parinthapat Pengpun, Can Udomcharoenchaikit, Weerayut Buaphet, and Peerat Limkonchotiwat. 2023. Cross-Lingual Data Augmentation For Thai Question-Answering. In Proceedings of the 1st GenBench Workshop on (Benchmarking) Generalisation in NLP, pages 193–203, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Cross-Lingual Data Augmentation For Thai Question-Answering (Pengpun et al., GenBench-WS 2023)
- PDF:
- https://preview.aclanthology.org/naacl24-info/2023.genbench-1.16.pdf