Reducing cohort bias in natural language understanding systems with targeted self-training scheme

Dieu-thu Le, Gabriela Hernandez, Bei Chen, Melanie Bradford


Abstract
Bias in machine learning models can be an issue when the models are trained on particular types of data that do not generalize well, causing under performance in certain groups of users. In this work, we focus on reducing the bias related to new customers in a digital voice assistant system. It is observed that natural language understanding models often have lower performance when dealing with requests coming from new users rather than experienced users. To mitigate this problem, we propose a framework that consists of two phases (1) a fixing phase with four active learning strategies used to identify important samples coming from new users, and (2) a self training phase where a teacher model trained from the first phase is used to annotate semi-supervised samples to expand the training data with relevant cohort utterances. We explain practical strategies that involve an identification of representative cohort-based samples through density clustering as well as employing implicit customer feedbacks to improve new customers’ experience. We demonstrate the effectiveness of our approach in a real world large scale voice assistant system for two languages, German and French through both offline experiments as well as A/B testings.
Anthology ID:
2023.acl-industry.53
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Sunayana Sitaram, Beata Beigman Klebanov, Jason D Williams
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
552–560
Language:
URL:
https://aclanthology.org/2023.acl-industry.53
DOI:
10.18653/v1/2023.acl-industry.53
Bibkey:
Cite (ACL):
Dieu-thu Le, Gabriela Hernandez, Bei Chen, and Melanie Bradford. 2023. Reducing cohort bias in natural language understanding systems with targeted self-training scheme. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track), pages 552–560, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Reducing cohort bias in natural language understanding systems with targeted self-training scheme (Le et al., ACL 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl-24-ws-corrections/2023.acl-industry.53.pdf