Fine-tuning Whisper Across 81 Languages

Shivam Singh; Alex Warstadt

Fine-tuning Whisper Across 81 Languages

Abstract

We fine-tune Whisper large-v3 independently on each of the 81 languages in the FLEURS benchmark. Fine-tuning improves WER for all 81 languages, reducing it by nearly 30% on average. However, improvement varies widely, and the language’s writing system is the best predictor of success. Latin and Cyrillic script languages reach single-digit WERs, while languages with unique scripts (Thai, Georgian, Burmese, Khmer) benefit least. We further show that Whisper’s BPE compression ratio predicts fine-tuning headroom (Spearman ρ ≈ −0.78), pointing to tokenization as the underlying bottleneck. We will release model weights upon publication.

Anthology ID:: 2026.scil-main.37
Volume:: Proceedings of the Society for Computation in Linguistics 2026
Month:: July
Year:: 2026
Address:: San Diego, CA
Editors:: Rob Voigt, Alex Warstadt, Naomi Feldman, Tal Linzen
Venues:: SCiL | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 408–410
Language:
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.scil-main.37/
DOI:
Bibkey:
Cite (ACL):: Shivam Singh and Alex Warstadt. 2026. Fine-tuning Whisper Across 81 Languages. In Proceedings of the Society for Computation in Linguistics 2026, pages 408–410, San Diego, CA. Association for Computational Linguistics.
Cite (Informal):: Fine-tuning Whisper Across 81 Languages (Singh & Warstadt, SCiL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.scil-main.37.pdf

PDF Cite Search Fix data