Abstract
Accurate spelling correction is a critical step in modern search interfaces, especially in an era of mobile devices and speech-to-text interfaces. For services that are deployed around the world, this poses a significant challenge for multilingual NLP: spelling errors need to be caught and corrected in all languages, and even in queries that use multiple languages. In this paper, we tackle this challenge using multi-teacher distillation. On our approach, a monolingual teacher model is trained for each language/locale, and these individual models are distilled into a single multilingual student model intended to serve all languages/locales. In experiments using open-source data as well as customer data from a worldwide search service, we show that this leads to highly effective spelling correction models that can meet the tight latency requirements of deployed services.- Anthology ID:
- 2023.emnlp-industry.15
- Volume:
- Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Mingxuan Wang, Imed Zitouni
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 142–151
- Language:
- URL:
- https://aclanthology.org/2023.emnlp-industry.15
- DOI:
- 10.18653/v1/2023.emnlp-industry.15
- Cite (ACL):
- Jingfen Zhang, Xuan Guo, Sravan Bodapati, and Christopher Potts. 2023. Multi-teacher Distillation for Multilingual Spelling Correction. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 142–151, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Multi-teacher Distillation for Multilingual Spelling Correction (Zhang et al., EMNLP 2023)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2023.emnlp-industry.15.pdf