Ehsan Hosseini-Asl
2021
Joint Energy-based Model Training for Better Calibrated Natural Language Understanding Models
Tianxing He
|
Bryan McCann
|
Caiming Xiong
|
Ehsan Hosseini-Asl
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume
In this work, we explore joint energy-based model (EBM) training during the finetuning of pretrained text encoders (e.g., Roberta) for natural language understanding (NLU) tasks. Our experiments show that EBM training can help the model reach a better calibration that is competitive to strong baselines, with little or no loss in accuracy. We discuss three variants of energy functions (namely scalar, hidden, and sharp-hidden) that can be defined on top of a text encoder, and compare them in experiments. Due to the discreteness of text data, we adopt noise contrastive estimation (NCE) to train the energy-based model. To make NCE training more effective, we train an auto-regressive noise model with the masked language model (MLM) objective.
2019
Transferable Multi-Domain State Generator for Task-Oriented Dialogue Systems
Chien-Sheng Wu
|
Andrea Madotto
|
Ehsan Hosseini-Asl
|
Caiming Xiong
|
Richard Socher
|
Pascale Fung
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Over-dependence on domain ontology and lack of sharing knowledge across domains are two practical and yet less studied problems of dialogue state tracking. Existing approaches generally fall short when tracking unknown slot values during inference and often have difficulties in adapting to new domains. In this paper, we propose a Transferable Dialogue State Generator (TRADE) that generates dialogue states from utterances using copy mechanism, facilitating transfer when predicting (domain, slot, value) triplets not encountered during training. Our model is composed of an utterance encoder, a slot gate, and a state generator, which are shared across domains. Empirical results demonstrate that TRADE achieves state-of-the-art 48.62% joint goal accuracy for the five domains of MultiWOZ, a human-human dialogue dataset. In addition, we show the transferring ability by simulating zero-shot and few-shot dialogue state tracking for unseen domains. TRADE achieves 60.58% joint goal accuracy in one of the zero-shot domains, and is able to adapt to few-shot cases without forgetting already trained domains.
Search
Co-authors
- Caiming Xiong 2
- Chien-Sheng Wu 1
- Andrea Madotto 1
- Richard Socher 1
- Pascale Fung 1
- show all...