Abstract
In task-oriented dialogue systems, response generation from meaning representations (MRs) often suffers from limited training examples, due to the high cost of annotating MR-to-Text pairs. Previous works on self-training leverage fine-tuned conversational models to automatically generate pseudo-labeled MR-to-Text pairs for further fine-tuning. However, some self-augmented data may be noisy or uninformative for the model to learn from. In this work, we propose a two-phase self-augmentation procedure to generate high-quality pseudo-labeled MR-to-Text pairs: the first phase selects the most informative MRs based on model’s prediction uncertainty; with the selected MRs, the second phase generates accurate responses by aggregating multiple perturbed latent representations from each MR. Empirical experiments on two benchmark datasets, FewShotWOZ and FewShotSGD, show that our method generally outperforms existing self-training methods on both automatic and human evaluations.- Anthology ID:
- 2022.findings-emnlp.201
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2022
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, United Arab Emirates
- Editors:
- Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2770–2784
- Language:
- URL:
- https://aclanthology.org/2022.findings-emnlp.201
- DOI:
- 10.18653/v1/2022.findings-emnlp.201
- Cite (ACL):
- Wanyu Du, Hanjie Chen, and Yangfeng Ji. 2022. Self-training with Two-phase Self-augmentation for Few-shot Dialogue Generation. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 2770–2784, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Cite (Informal):
- Self-training with Two-phase Self-augmentation for Few-shot Dialogue Generation (Du et al., Findings 2022)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2022.findings-emnlp.201.pdf