2023
pdf
abs
Sharing Encoder Representations across Languages, Domains and Tasks in Large-Scale Spoken Language Understanding
Jonathan Hueser
|
Judith Gaspers
|
Thomas Gueudre
|
Chandana Prakash
|
Jin Cao
|
Daniil Sorokin
|
Quynh Do
|
Nicolas Anastassacos
|
Tobias Falke
|
Turan Gojayev
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track)
Leveraging representations from pre-trained transformer-based encoders achieves state-of-the-art performance on numerous NLP tasks. Larger encoders can improve accuracy for spoken language understanding (SLU) but are challenging to use given the inference latency constraints of online systems (especially on CPU machines).We evaluate using a larger 170M parameter BERT encoder that shares representations across languages, domains and tasks for SLU compared to using smaller 17M parameter BERT encoders with language-, domain- and task-decoupled finetuning.Running inference with a larger shared encoder on GPU is latency neutral and reduces infrastructure cost compared to running inference for decoupled smaller encoders on CPU machines. The larger shared encoder reduces semantic error rates by 4.62% for test sets representing user requests to voice-controlled devices and 5.79% on the tail of the test sets on average across four languages.
2022
pdf
abs
Towards Need-Based Spoken Language Understanding Model Updates: What Have We Learned?
Quynh Do
|
Judith Gaspers
|
Daniil Sorokin
|
Patrick Lehnen
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track
In productionized machine learning systems, online model performance is known to deteriorate over time when there is a distributional drift between offline training and online application data. As a remedy, models are typically retrained at fixed time intervals, implying high computational and manual costs. This work aims at decreasing such costs in productionized, large-scale Spoken Language Understanding systems. In particular, we develop a need-based re-training strategy guided by an efficient drift detector and discuss the arising challenges including system complexity, overlapping model releases, observation limitation and the absence of annotated resources at runtime. We present empirical results on historical data and confirm the utility of our design decisions via an online A/B experiment.
pdf
abs
Distributionally Robust Finetuning BERT for Covariate Drift in Spoken Language Understanding
Samuel Broscheit
|
Quynh Do
|
Judith Gaspers
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
In this study, we investigate robustness against covariate drift in spoken language understanding (SLU). Covariate drift can occur in SLUwhen there is a drift between training and testing regarding what users request or how they request it. To study this we propose a method that exploits natural variations in data to create a covariate drift in SLU datasets. Experiments show that a state-of-the-art BERT-based model suffers performance loss under this drift. To mitigate the performance loss, we investigate distributionally robust optimization (DRO) for finetuning BERT-based models. We discuss some recent DRO methods, propose two new variants and empirically show that DRO improves robustness under drift.
2021
pdf
abs
The impact of domain-specific representations on BERT-based multi-domain spoken language understanding
Judith Gaspers
|
Quynh Do
|
Tobias Röding
|
Melanie Bradford
Proceedings of the Second Workshop on Domain Adaptation for NLP
This paper provides the first experimental study on the impact of using domain-specific representations on a BERT-based multi-task spoken language understanding (SLU) model for multi-domain applications. Our results on a real-world dataset covering three languages indicate that by using domain-specific representations learned adversarially, model performance can be improved across all of the three SLU subtasks domain classification, intent classification and slot filling. Gains are particularly large for domains with limited training data.
2020
pdf
abs
To What Degree Can Language Borders Be Blurred In BERT-based Multilingual Spoken Language Understanding?
Quynh Do
|
Judith Gaspers
|
Tobias Roeding
|
Melanie Bradford
Proceedings of the 28th International Conference on Computational Linguistics
This paper addresses the question as to what degree a BERT-based multilingual Spoken Language Understanding (SLU) model can transfer knowledge across languages. Through experiments we will show that, although it works substantially well even on distant language groups, there is still a gap to the ideal multilingual performance. In addition, we propose a novel BERT-based adversarial model architecture to learn language-shared and language-specific representations for multilingual SLU. Our experimental results prove that the proposed model is capable of narrowing the gap to the ideal multilingual performance.
2019
pdf
abs
Cross-lingual Transfer Learning with Data Selection for Large-Scale Spoken Language Understanding
Quynh Do
|
Judith Gaspers
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
A typical cross-lingual transfer learning approach boosting model performance on a language is to pre-train the model on all available supervised data from another language. However, in large-scale systems this leads to high training times and computational requirements. In addition, characteristic differences between the source and target languages raise a natural question of whether source data selection can improve the knowledge transfer. In this paper, we address this question and propose a simple but effective language model based source-language data selection method for cross-lingual transfer learning in large-scale spoken language understanding. The experimental results show that with data selection i) source data and hence training speed is reduced significantly and ii) model performance is improved.