kNN For Whisper And Its Effect On Bias And Speaker Adaptation

Maya K. Nachesa, Vlad Niculae


Abstract
Speech recognition performance varies by language, domain, and speaker characteristics such as accent, but fine-tuning a model on any of these categories may lead to catastrophic forgetting. Token-level k nearest neighbor search (kNN), first proposed for neural sequence decoders for natural language generation (NLG) and machine translation (MT), is a non-parametric method that instead adapts using inference-time search in an external datastore, without training the underlying model. We show that Whisper, a transformer end-to-end speech model, benefits from kNN. We investigate the differences between the speech and text setups. We discuss implications for speaker adaptation, and analyze improvements by gender, accent, and age.
Anthology ID:
2025.findings-naacl.369
Volume:
Findings of the Association for Computational Linguistics: NAACL 2025
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6621–6627
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.findings-naacl.369/
DOI:
Bibkey:
Cite (ACL):
Maya K. Nachesa and Vlad Niculae. 2025. kNN For Whisper And Its Effect On Bias And Speaker Adaptation. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 6621–6627, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
kNN For Whisper And Its Effect On Bias And Speaker Adaptation (Nachesa & Niculae, Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.findings-naacl.369.pdf