The Effect of Language Diversity When Fine-Tuning Large Language Models for Translation

David Stap, Christof Monz


Abstract
Prior research diverges on language diversity in LLM fine-tuning: Some studies report benefits while others find no advantages. Through controlled fine-tuning experiments across 132 translation directions, we systematically resolve these disparities. We find that expanding language diversity during fine-tuning improves translation quality for both unsupervised and—surprisingly—supervised pairs, despite less diverse models being fine-tuned exclusively on these supervised pairs. However, benefits plateau or decrease beyond a certain diversity threshold. We show that increased language diversity creates more language-agnostic representations. These representational adaptations help explain the improved performance in models fine-tuned with greater diversity.
Anthology ID:
2025.findings-emnlp.224
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4199–4211
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.224/
DOI:
10.18653/v1/2025.findings-emnlp.224
Bibkey:
Cite (ACL):
David Stap and Christof Monz. 2025. The Effect of Language Diversity When Fine-Tuning Large Language Models for Translation. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 4199–4211, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
The Effect of Language Diversity When Fine-Tuning Large Language Models for Translation (Stap & Monz, Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.224.pdf
Checklist:
 2025.findings-emnlp.224.checklist.pdf