Tobias Röding


FPI: Failure Point Isolation in Large-scale Conversational Assistants
Rinat Khaziev | Usman Shahid | Tobias Röding | Rakesh Chada | Emir Kapanci | Pradeep Natarajan
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track

Large-scale conversational assistants such as Cortana, Alexa, Google Assistant and Siri process requests through a series of modules for wake word detection, speech recognition, language understanding and response generation. An error in one of these modules can cascade through the system. Given the large traffic volumes in these assistants, it is infeasible to manually analyze the data, identify requests with processing errors and isolate the source of error. We present a machine learning system to address this challenge. First, we embed the incoming request and context, such as system response and subsequent turns, using pre-trained transformer models. Then, we combine these embeddings with encodings of additional metadata features (such as confidence scores from different modules in the online system) using a “mixing-encoder” to output the failure point predictions. Our system obtains 92.2% of human performance on this task while scaling to analyze the entire traffic in 8 different languages of a large-scale conversational assistant. We present detailed ablation studies analyzing the impact of different modeling choices.


The impact of domain-specific representations on BERT-based multi-domain spoken language understanding
Judith Gaspers | Quynh Do | Tobias Röding | Melanie Bradford
Proceedings of the Second Workshop on Domain Adaptation for NLP

This paper provides the first experimental study on the impact of using domain-specific representations on a BERT-based multi-task spoken language understanding (SLU) model for multi-domain applications. Our results on a real-world dataset covering three languages indicate that by using domain-specific representations learned adversarially, model performance can be improved across all of the three SLU subtasks domain classification, intent classification and slot filling. Gains are particularly large for domains with limited training data.