Dialogue Distillation: Open-Domain Dialogue Augmentation Using Unpaired Data

Rongsheng Zhang, Yinhe Zheng, Jianzhi Shao, Xiaoxi Mao, Yadong Xi, Minlie Huang


Abstract
Recent advances in open-domain dialogue systems rely on the success of neural models that are trained on large-scale data. However, collecting large-scale dialogue data is usually time-consuming and labor-intensive. To address this data dilemma, we propose a novel data augmentation method for training open-domain dialogue models by utilizing unpaired data. Specifically, a data-level distillation process is first proposed to construct augmented dialogues where both post and response are retrieved from the unpaired data. A ranking module is employed to filter out low-quality dialogues. Further, a model-level distillation process is employed to distill a teacher model trained on high-quality paired data to augmented dialogue pairs, thereby preventing dialogue models from being affected by the noise in the augmented data. Automatic and manual evaluation indicates that our method can produce high-quality dialogue pairs with diverse contents, and the proposed data-level and model-level dialogue distillation can improve the performance of competitive baselines.
Anthology ID:
2020.emnlp-main.277
Volume:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:
November
Year:
2020
Address:
Online
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3449–3460
Language:
URL:
https://aclanthology.org/2020.emnlp-main.277
DOI:
10.18653/v1/2020.emnlp-main.277
Bibkey:
Cite (ACL):
Rongsheng Zhang, Yinhe Zheng, Jianzhi Shao, Xiaoxi Mao, Yadong Xi, and Minlie Huang. 2020. Dialogue Distillation: Open-Domain Dialogue Augmentation Using Unpaired Data. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3449–3460, Online. Association for Computational Linguistics.
Cite (Informal):
Dialogue Distillation: Open-Domain Dialogue Augmentation Using Unpaired Data (Zhang et al., EMNLP 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/paclic-22-ingestion/2020.emnlp-main.277.pdf
Video:
 https://slideslive.com/38938718
Code
 njuzrs/dialogue_distillation