MMATH: A Multilingual Benchmark for Mathematical Reasoning

Wenyang Luo, Xin Zhao, Jing Sha, Shijin Wang, Ji-Rong Wen


Abstract
The advent of large reasoning models, such as OpenAI o1 and DeepSeek R1, has significantly advanced complex reasoning tasks. However, their capabilities in multilingual complex reasoning remain underexplored, with existing efforts largely focused on simpler tasks like MGSM. To address this gap, we introduce , a benchmark for multilingual complex reasoning spanning 374 high-quality math problems across 10 typologically diverse languages. Using , we observe that even advanced models like DeepSeek R1 exhibit substantial performance disparities across languages and suffer from a critical off-target issue—generating responses in unintended languages. To address this, we explore strategies including prompting and training, demonstrating that reasoning in English and answering in target languages can simultaneously enhance performance and preserve target-language consistency. Our findings offer new insights and practical strategies for advancing the multilingual reasoning capabilities of large language models. Our code and data could be found at https://github.com/RUCAIBox/MMATH.
Anthology ID:
2025.findings-emnlp.598
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11187–11202
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.598/
DOI:
10.18653/v1/2025.findings-emnlp.598
Bibkey:
Cite (ACL):
Wenyang Luo, Xin Zhao, Jing Sha, Shijin Wang, and Ji-Rong Wen. 2025. MMATH: A Multilingual Benchmark for Mathematical Reasoning. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 11187–11202, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
MMATH: A Multilingual Benchmark for Mathematical Reasoning (Luo et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.598.pdf
Checklist:
 2025.findings-emnlp.598.checklist.pdf