DialAug: Mixing up Dialogue Contexts in Contrastive Learning for Robust Conversational Modeling
Lahari Poddar | Peiyao Wang | Julia Reinspach
Proceedings of the 29th International Conference on Computational Linguistics
Retrieval-based conversational systems learn to rank response candidates for a given dialogue context by computing the similarity between their vector representations. However, training on a single textual form of the multi-turn context limits the ability of a model to learn representations that generalize to natural perturbations seen during inference. In this paper we propose a framework that incorporates augmented versions of a dialogue context into the learning objective. We utilize contrastive learning as an auxiliary objective to learn robust dialogue context representations that are invariant to perturbations injected through the augmentation method. We experiment with four benchmark dialogue datasets and demonstrate that our framework combines well with existing augmentation methods and can significantly improve over baseline BERT-based ranking architectures. Furthermore, we propose a novel data augmentation method, ConMix, that adds token level perturbations through stochastic mixing of tokens from other contexts in the batch. We show that our proposed augmentation method outperforms previous data augmentation approaches, and provides dialogue representations that are more robust to common perturbations seen during inference.
Large-scale pretrained transformer models have demonstrated state-of-the-art (SOTA) performance in a variety of NLP tasks. Nowadays, numerous pretrained models are available in different model flavors and different languages, and can be easily adapted to one’s downstream task. However, only a limited number of models are available for dialogue tasks, and in particular, goal-oriented dialogue tasks. In addition, the available pretrained models are trained on general domain language, creating a mismatch between the pretraining language and the downstream domain launguage. In this contribution, we present CS-BERT, a BERT model pretrained on millions of dialogues in the customer service domain. We evaluate CS-BERT on several downstream customer service dialogue tasks, and demonstrate that our in-domain pretraining is advantageous compared to other pretrained models in both zero-shot experiments as well as in finetuning experiments, especially in a low-resource data setting.