Beyond WER: Probing Whisper’s Sub‐token Decoder Across Diverse Language Resource Levels

Siyu Liang, Nicolas Ballier, Gina-Anne Levow, Richard Wright


Abstract
While large multilingual automatic speech recognition (ASR) models achieve remarkable performance, the internal mechanisms of the end-to-end pipeline, particularly concerning fairness and efficacy across languages, remain underexplored. This paper introduces a fine-grained analysis of Whisper’s multilingual decoder, examining its sub-token hypotheses during transcription across languages with various resource levels. Our method traces the beam search path, capturing sub-token guesses and their associated probabilities. Results reveal that higher resource languages benefit from higher likelihood of the correct token being top-ranked, greater confidence, lower predictive entropy, and more diverse alternative candidates. Lower resource languages fare worse on these metrics, but also exhibit distinct clustering patterns in sub-token usage sometimes influenced by typology in our PCA and t-SNE analysis. This sub-token probing uncovers systematic decoding disparities masked by aggregate error rates and points towards targeted interventions to ameliorate the imbalanced development of speech technology.
Anthology ID:
2025.emnlp-main.1591
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
31225–31235
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1591/
DOI:
Bibkey:
Cite (ACL):
Siyu Liang, Nicolas Ballier, Gina-Anne Levow, and Richard Wright. 2025. Beyond WER: Probing Whisper’s Sub‐token Decoder Across Diverse Language Resource Levels. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 31225–31235, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Beyond WER: Probing Whisper’s Sub‐token Decoder Across Diverse Language Resource Levels (Liang et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1591.pdf
Checklist:
 2025.emnlp-main.1591.checklist.pdf