Maya K. Nachesa


2025

pdf bib
kNN For Whisper And Its Effect On Bias And Speaker Adaptation
Maya K. Nachesa | Vlad Niculae
Findings of the Association for Computational Linguistics: NAACL 2025

Speech recognition performance varies by language, domain, and speaker characteristics such as accent, but fine-tuning a model on any of these categories may lead to catastrophic forgetting. Token-level k nearest neighbor search (kNN), first proposed for neural sequence decoders for natural language generation (NLG) and machine translation (MT), is a non-parametric method that instead adapts using inference-time search in an external datastore, without training the underlying model. We show that Whisper, a transformer end-to-end speech model, benefits from kNN. We investigate the differences between the speech and text setups. We discuss implications for speaker adaptation, and analyze improvements by gender, accent, and age.