Demystifying Multilingual Reasoning in Process Reward Modeling

Weixuan Wang, Minghao Wu, Barry Haddow, Alexandra Birch


Abstract
Large language models (LLMs) are designed to perform a wide range of tasks. To improve their ability to solve complex problems requiring multi-step reasoning, recent research leverages process reward modeling to provide fine-grained feedback at each step of the reasoning process for reinforcement learning (RL), but it predominantly focuses on English. In this paper, we tackle the critical challenge of extending process reward models (PRMs) to multilingual settings. To achieve this, we train multilingual PRMs on a dataset spanning seven languages, which is translated from English. Through comprehensive evaluations on two widely used reasoning benchmarks across 11 languages, we demonstrate that multilingual PRMs not only improve average accuracy but also reduce early-stage reasoning errors. Furthermore, our results highlight the sensitivity of multilingual PRMs to both the number of training languages and the volume of English data, while also uncovering the benefits arising from more candidate responses and trainable parameters. This work opens promising avenues for robust multilingual applications in complex, multi-step reasoning tasks.
Anthology ID:
2025.findings-emnlp.519
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9775–9788
Language:
URL:
https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.519/
DOI:
10.18653/v1/2025.findings-emnlp.519
Bibkey:
Cite (ACL):
Weixuan Wang, Minghao Wu, Barry Haddow, and Alexandra Birch. 2025. Demystifying Multilingual Reasoning in Process Reward Modeling. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 9775–9788, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Demystifying Multilingual Reasoning in Process Reward Modeling (Wang et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.519.pdf
Checklist:
 2025.findings-emnlp.519.checklist.pdf