Multilingual Arbitration: Optimizing Data Pools to Accelerate Multilingual Progress

Ayomide Odumakinde, Daniel D’souza, Pat Verga, Beyza Ermis, Sara Hooker


Abstract
Synthetic data has driven recent state-of-the-art advancements, but reliance on a single oracle teacher model can lead to model collapse and bias propagation. These issues are particularly severe in multilingual settings, where no single model excels across all languages. In this study, we propose multilingual arbitration, which exploits performance variations among multiple models for each language. By strategically routing samples through a diverse set of models, each with unique strengths, we mitigate these challenges and enhance multilingual performance. Extensive experiments with state-of-the-art models demonstrate that our approach significantly surpasses single-teacher distillation, achieving up to 80% win rates over proprietary and open-weight models like Gemma 2, Llama 3.1, and Mistral v0.3, with the largest improvements in low-resource languages.
Anthology ID:
2025.acl-long.939
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
19142–19164
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.939/
DOI:
Bibkey:
Cite (ACL):
Ayomide Odumakinde, Daniel D’souza, Pat Verga, Beyza Ermis, and Sara Hooker. 2025. Multilingual Arbitration: Optimizing Data Pools to Accelerate Multilingual Progress. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 19142–19164, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Multilingual Arbitration: Optimizing Data Pools to Accelerate Multilingual Progress (Odumakinde et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.939.pdf