Shivam Singh


2026

We fine-tune Whisper large-v3 independently on each of the 81 languages in the FLEURS benchmark. Fine-tuning improves WER for all 81 languages, reducing it by nearly 30% on average. However, improvement varies widely, and the language’s writing system is the best predictor of success. Latin and Cyrillic script languages reach single-digit WERs, while languages with unique scripts (Thai, Georgian, Burmese, Khmer) benefit least. We further show that Whisper’s BPE compression ratio predicts fine-tuning headroom (Spearman ρ ≈ −0.78), pointing to tokenization as the underlying bottleneck. We will release model weights upon publication.