Fine-tuning Whisper Across 81 Languages

Shivam Singh, Alex Warstadt


Abstract
We fine-tune Whisper large-v3 independently on each of the 81 languages in the FLEURS benchmark. Fine-tuning improves WER for all 81 languages, reducing it by nearly 30% on average. However, improvement varies widely, and the language’s writing system is the best predictor of success. Latin and Cyrillic script languages reach single-digit WERs, while languages with unique scripts (Thai, Georgian, Burmese, Khmer) benefit least. We further show that Whisper’s BPE compression ratio predicts fine-tuning headroom (Spearman ρ ≈ −0.78), pointing to tokenization as the underlying bottleneck. We will release model weights upon publication.
Anthology ID:
2026.scil-main.37
Volume:
Proceedings of the Society for Computation in Linguistics 2026
Month:
July
Year:
2026
Address:
San Diego, CA
Editors:
Rob Voigt, Alex Warstadt, Naomi Feldman, Tal Linzen
Venues:
SCiL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
408–410
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.scil-main.37/
DOI:
Bibkey:
Cite (ACL):
Shivam Singh and Alex Warstadt. 2026. Fine-tuning Whisper Across 81 Languages. In Proceedings of the Society for Computation in Linguistics 2026, pages 408–410, San Diego, CA. Association for Computational Linguistics.
Cite (Informal):
Fine-tuning Whisper Across 81 Languages (Singh & Warstadt, SCiL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.scil-main.37.pdf