Multilingual Arbitration: Optimizing Data Pools to Accelerate Multilingual Progress
Ayomide Odumakinde, Daniel D’souza, Pat Verga, Beyza Ermis, Sara Hooker
Abstract
Synthetic data has driven recent state-of-the-art advancements, but reliance on a single oracle teacher model can lead to model collapse and bias propagation. These issues are particularly severe in multilingual settings, where no single model excels across all languages. In this study, we propose multilingual arbitration, which exploits performance variations among multiple models for each language. By strategically routing samples through a diverse set of models, each with unique strengths, we mitigate these challenges and enhance multilingual performance. Extensive experiments with state-of-the-art models demonstrate that our approach significantly surpasses single-teacher distillation, achieving up to 80% win rates over proprietary and open-weight models like Gemma 2, Llama 3.1, and Mistral v0.3, with the largest improvements in low-resource languages.- Anthology ID:
- 2025.acl-long.939
- Volume:
- Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- July
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 19142–19164
- Language:
- URL:
- https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.939/
- DOI:
- Cite (ACL):
- Ayomide Odumakinde, Daniel D’souza, Pat Verga, Beyza Ermis, and Sara Hooker. 2025. Multilingual Arbitration: Optimizing Data Pools to Accelerate Multilingual Progress. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 19142–19164, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- Multilingual Arbitration: Optimizing Data Pools to Accelerate Multilingual Progress (Odumakinde et al., ACL 2025)
- PDF:
- https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.939.pdf