Fairness in Automatic Speech Recognition Isn’t a One-Size-Fits-All

Hend ElGhazaly, Bahman Mirheidari, Heidi Christensen, Nafise Sadat Moosavi


Abstract
Modern Automatic Speech Recognition (ASR) systems are increasingly deployed in high-stakes settings, including clinical interviews, public services, and educational tools, where equitable performance across speaker groups is essential. While pre-trained speech models like Whisper achieve strong overall accuracy, they often exhibit inconsistent group-level performance that varies across domains. These disparities are not fixed properties of the model, but emerge from the interaction between model, data, and task—posing challenges for fairness interventions designed in-domain.We frame fairness in ASR as a generalisation problem. We fine-tune a Whisper model on the Fair-Speech corpus using four strategies: basic fine-tuning, demographic rebalancing, gender-swapped data augmentation, and a novel contrastive learning objective that encourages gender-invariant representations. We evaluate performance across multiple aspects of fairness and utility, both in-domain and on three out-of-domain test sets: LibriSpeech, EdAcc, and CognoSpeak.Our findings show that the method with the best in-domain fairness performed worst out-of-domain, illustrating that fairness gains do not always generalise. Demographic balancing generalises more consistently, while our contrastive method offers a practical alternative: it achieves stable, cross-domain fairness improvements without requiring changes to the training data distribution, and with minimal accuracy trade-offs.
Anthology ID:
2025.findings-emnlp.1044
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
19169–19178
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1044/
DOI:
10.18653/v1/2025.findings-emnlp.1044
Bibkey:
Cite (ACL):
Hend ElGhazaly, Bahman Mirheidari, Heidi Christensen, and Nafise Sadat Moosavi. 2025. Fairness in Automatic Speech Recognition Isn’t a One-Size-Fits-All. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 19169–19178, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Fairness in Automatic Speech Recognition Isn’t a One-Size-Fits-All (ElGhazaly et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1044.pdf
Checklist:
 2025.findings-emnlp.1044.checklist.pdf