Abstract
Question generation (QG) approaches based on large neural models require (i) large-scale and (ii) high-quality training data. These two requirements pose difficulties for specific application domains where training data is expensive and difficult to obtain. The trained QG models’ effectiveness can degrade significantly when they are applied on a different domain due to domain shift. In this paper, we explore an unsupervised domain adaptation approach to combat the lack of training data and domain shift issue with domain data selection and self-training. We first present a novel answer-aware strategy for domain data selection to select data with the most similarity to a new domain. The selected data are then used as pseudo-in-domain data to retrain the QG model. We then present generation confidence guided self-training with two generation confidence modeling methods (i) generated questions’ perplexity and (ii) the fluency score. We test our approaches on three large public datasets with different domain similarities, using a transformer-based pre-trained QG model. The results show that our proposed approaches outperform the baselines, and show the viability of unsupervised domain adaptation with answer-aware data selection and self-training on the QG task.- Anthology ID:
- 2022.findings-naacl.183
- Volume:
- Findings of the Association for Computational Linguistics: NAACL 2022
- Month:
- July
- Year:
- 2022
- Address:
- Seattle, United States
- Editors:
- Marine Carpuat, Marie-Catherine de Marneffe, Ivan Vladimir Meza Ruiz
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2388–2401
- Language:
- URL:
- https://preview.aclanthology.org/build-pipeline-with-new-library/2022.findings-naacl.183/
- DOI:
- 10.18653/v1/2022.findings-naacl.183
- Cite (ACL):
- Peide Zhu and Claudia Hauff. 2022. Unsupervised Domain Adaptation for Question Generation with DomainData Selection and Self-training. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 2388–2401, Seattle, United States. Association for Computational Linguistics.
- Cite (Informal):
- Unsupervised Domain Adaptation for Question Generation with DomainData Selection and Self-training (Zhu & Hauff, Findings 2022)
- PDF:
- https://preview.aclanthology.org/build-pipeline-with-new-library/2022.findings-naacl.183.pdf
- Data
- Natural Questions, RACE, SciQ