Best-of-L: Cross-Lingual Reward Modeling for Mathematical Reasoning

Sara Rajaee; Rochelle Choenni; Ekaterina Shutova; Christof Monz

Best-of-L: Cross-Lingual Reward Modeling for Mathematical Reasoning

Sara Rajaee, Rochelle Choenni, Ekaterina Shutova, Christof Monz

Abstract

While the reasoning abilities of large language models (LLMs) continue to advance, it remains underexplored how such abilities vary across languages in multilingual LLMs and whether different languages generate distinct reasoning paths. In this work, we show that reasoning traces generated in different languages often provide complementary signals for mathematical reasoning. We propose cross-lingual outcome reward modeling, a framework that ranks candidate reasoning traces across languages rather than within a single language.Our experiments on the MGSM benchmark show that cross-lingual reward modeling improves accuracy by up to 10 points compared to using reward modeling within a single language, benefiting both high- and low-resource languages.Notably, cross-lingual sampling improves English performance under low inference budgets, despite English being the strongest individual language.Our findings reveal new opportunities to improve multilingual reasoning by leveraging the complementary strengths of diverse languages.

Anthology ID:: 2026.findings-eacl.99
Volume:: Findings of the Association for Computational Linguistics: EACL 2026
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1930–1939
Language:
URL:: https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.99/
DOI:
Bibkey:
Cite (ACL):: Sara Rajaee, Rochelle Choenni, Ekaterina Shutova, and Christof Monz. 2026. Best-of-L: Cross-Lingual Reward Modeling for Mathematical Reasoning. In Findings of the Association for Computational Linguistics: EACL 2026, pages 1930–1939, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Best-of-L: Cross-Lingual Reward Modeling for Mathematical Reasoning (Rajaee et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.99.pdf
Checklist:: 2026.findings-eacl.99.checklist.pdf

PDF Cite Search Checklist Fix data