Jason Krone


2021

pdf bib
On the Robustness of Intent Classification and Slot Labeling in Goal-oriented Dialog Systems to Real-world Noise
Sailik Sengupta | Jason Krone | Saab Mansour
Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI

Intent Classification (IC) and Slot Labeling (SL) models, which form the basis of dialogue systems, often encounter noisy data in real-word environments. In this work, we investigate how robust IC/SL models are to noisy data. We collect and publicly release a test-suite for seven common noise types found in production human-to-bot conversations (abbreviations, casing, misspellings, morphological variants, paraphrases, punctuation and synonyms). On this test-suite, we show that common noise types substantially degrade the IC accuracy and SL F1 performance of state-of-the-art BERT-based IC/SL models. By leveraging cross-noise robustness transfer, i.e. training on one noise type to improve robustness on another noise type, we design aggregate data-augmentation approaches that increase the model performance across all seven noise types by +10.8% for IC accuracy and +15 points for SL F1 on average. To the best of our knowledge, this is the first work to present a single IC/SL model that is robust to a wide range of noise phenomena.

pdf bib
Soft Layer Selection with Meta-Learning for Zero-Shot Cross-Lingual Transfer
Weijia Xu | Batool Haider | Jason Krone | Saab Mansour
Proceedings of the 1st Workshop on Meta Learning and Its Applications to Natural Language Processing

Multilingual pre-trained contextual embedding models (Devlin et al., 2019) have achieved impressive performance on zero-shot cross-lingual transfer tasks. Finding the most effective fine-tuning strategy to fine-tune these models on high-resource languages so that it transfers well to the zero-shot languages is a non-trivial task. In this paper, we propose a novel meta-optimizer to soft-select which layers of the pre-trained model to freeze during fine-tuning. We train the meta-optimizer by simulating the zero-shot transfer scenario. Results on cross-lingual natural language inference show that our approach improves over the simple fine-tuning baseline and X-MAML (Nooralahzadeh et al., 2020).

2020

pdf bib
Learning to Classify Intents and Slot Labels Given a Handful of Examples
Jason Krone | Yi Zhang | Mona Diab
Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI

Intent classification (IC) and slot filling (SF) are core components in most goal-oriented dialogue systems. Current IC/SF models perform poorly when the number of training examples per class is small. We propose a new few-shot learning task, few-shot IC/SF, to study and improve the performance of IC and SF models on classes not seen at training time in ultra low resource scenarios. We establish a few-shot IC/SF benchmark by defining few-shot splits for three public IC/SF datasets, ATIS, TOP, and Snips. We show that two popular few-shot learning algorithms, model agnostic meta learning (MAML) and prototypical networks, outperform a fine-tuning baseline on this benchmark. Prototypical networks achieves significant gains in IC performance on the ATIS and TOP datasets, while both prototypical networks and MAML outperform the baseline with respect to SF on all three datasets. In addition, we demonstrate that joint training as well as the use of pre-trained language models, ELMo and BERT in our case, are complementary to these few-shot learning methods and yield further gains.

pdf bib
Augmented Natural Language for Generative Sequence Labeling
Ben Athiwaratkun | Cicero Nogueira dos Santos | Jason Krone | Bing Xiang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We propose a generative framework for joint sequence labeling and sentence-level classification. Our model performs multiple sequence labeling tasks at once using a single, shared natural language output space. Unlike prior discriminative methods, our model naturally incorporates label semantics and shares knowledge across tasks. Our framework general purpose, performing well on few-shot learning, low resource, and high resource tasks. We demonstrate these advantages on popular named entity recognition, slot labeling, and intent classification benchmarks. We set a new state-of-the-art for few-shot slot labeling, improving substantially upon the previous 5-shot (75.0% to 90.9%) and 1-shot (70.4% to 81.0%) state-of-the-art results. Furthermore, our model generates large improvements (46.27% to 63.83%) in low resource slot labeling over a BERT baseline by incorporating label semantics. We also maintain competitive results on high resource tasks, performing within two points of the state-of-the-art on all tasks and setting a new state-of-the-art on the SNIPS dataset.

2019

pdf bib
Multi-Domain Goal-Oriented Dialogues (MultiDoGO): Strategies toward Curating and Annotating Large Scale Dialogue Data
Denis Peskov | Nancy Clarke | Jason Krone | Brigi Fodor | Yi Zhang | Adel Youssef | Mona Diab
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

The need for high-quality, large-scale, goal-oriented dialogue datasets continues to grow as virtual assistants become increasingly wide-spread. However, publicly available datasets useful for this area are limited either in their size, linguistic diversity, domain coverage, or annotation granularity. In this paper, we present strategies toward curating and annotating large scale goal oriented dialogue data. We introduce the MultiDoGO dataset to overcome these limitations. With a total of over 81K dialogues harvested across six domains, MultiDoGO is over 8 times the size of MultiWOZ, the other largest comparable dialogue dataset currently available to the public. Over 54K of these harvested conversations are annotated for intent classes and slot labels. We adopt a Wizard-of-Oz approach wherein a crowd-sourced worker (the “customer”) is paired with a trained annotator (the “agent”). The data curation process was controlled via biases to ensure a diversity in dialogue flows following variable dialogue policies. We provide distinct class label tags for agents vs. customer utterances, along with applicable slot labels. We also compare and contrast our strategies on annotation granularity, i.e. turn vs. sentence level. Furthermore, we compare and contrast annotations curated by leveraging professional annotators vs the crowd. We believe our strategies for eliciting and annotating such a dialogue dataset scales across modalities and domains and potentially languages in the future. To demonstrate the efficacy of our devised strategies we establish neural baselines for classification on the agent and customer utterances as well as slot labeling for each domain.