This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we generate only three BibTeX files per volume, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
Dialog state tracking (DST) is a core step for task-oriented dialogue systems aiming to track the user’s current goal during a dialogue. Recently a special focus has been put on applying existing DST models to new domains, in other words performing zero-shot cross-domain transfer. While recent state-of-the-art models leverage large pre-trained language models, no work has been made on understanding and improving the results of first developed zero-shot models like SUMBT. In this paper, we thus propose to improve SUMBT zero-shot results on MultiWOZ by using attention modulation during inference. This method improves SUMBT zero-shot results significantly on two domains and does not worsen the initial performance with the great advantage of needing no additional training.
Knowledge transfer between neural language models is a widely used technique that has proven to improve performance in a multitude of natural language tasks, in particular with the recent rise of large pre-trained language models like BERT. Similarly, high cross-lingual transfer has been shown to occur in multilingual language models. Hence, it is of great importance to better understand this phenomenon as well as its limits. While most studies about cross-lingual transfer focus on training on independent and identically distributed (i.e. i.i.d.) samples, in this paper we study cross-lingual transfer in a continual learning setting on two sequence labeling tasks: slot-filling and named entity recognition. We investigate this by training multilingual BERT on sequences of 9 languages, one language at a time, on the MultiATIS++ and MultiCoNER corpora. Our first findings are that forward transfer between languages is retained although forgetting is present. Additional experiments show that lost performance can be recovered with as little as a single training epoch even if forgetting was high, which can be explained by a progressive shift of model parameters towards a better multilingual initialization. We also find that commonly used metrics might be insufficient to assess continual learning performance.
Le but de cet article est de définir comment le Lifelong Learning (LL) pourrait être appliqué aux systèmes de dialogue orientés tâche. Un système de dialogue devrait être en mesure d’apprendre de nouvelles connaissances, après avoir été déployé, et ceci de manière continue grâce à ses interactions avec l’utilisateur. Nous identifions ainsi deux aspects s’appliquant à un tel système : l’amélioration de ses capacités conversationnelles, et l’enrichissement de sa base de connaissances. Nous appliquons ces idées à un chatbot développé dans le cadre du projet LIHLITH. Nous montrons ainsi qu’un tel système doit être capable (1) de détecter la présence d’une situation inconnue (2) de décider quand et comment interagir avec l’utilisateur afin d’extraire de nouvelles connaissances et (3) de s’adapter à ces nouvelles connaissances, tout en considérant la fiabilité de celles-ci.