Abstract
In productionized machine learning systems, online model performance is known to deteriorate over time when there is a distributional drift between offline training and online application data. As a remedy, models are typically retrained at fixed time intervals, implying high computational and manual costs. This work aims at decreasing such costs in productionized, large-scale Spoken Language Understanding systems. In particular, we develop a need-based re-training strategy guided by an efficient drift detector and discuss the arising challenges including system complexity, overlapping model releases, observation limitation and the absence of annotated resources at runtime. We present empirical results on historical data and confirm the utility of our design decisions via an online A/B experiment.- Anthology ID:
- 2022.emnlp-industry.11
- Volume:
- Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, UAE
- Editors:
- Yunyao Li, Angeliki Lazaridou
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 121–127
- Language:
- URL:
- https://aclanthology.org/2022.emnlp-industry.11
- DOI:
- 10.18653/v1/2022.emnlp-industry.11
- Cite (ACL):
- Quynh Do, Judith Gaspers, Daniil Sorokin, and Patrick Lehnen. 2022. Towards Need-Based Spoken Language Understanding Model Updates: What Have We Learned?. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 121–127, Abu Dhabi, UAE. Association for Computational Linguistics.
- Cite (Informal):
- Towards Need-Based Spoken Language Understanding Model Updates: What Have We Learned? (Do et al., EMNLP 2022)
- PDF:
- https://preview.aclanthology.org/landing_page/2022.emnlp-industry.11.pdf