Runfeng Qiao
2025
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?
Runqi Qiao
|
Qiuna Tan
|
Guanting Dong
|
MinhuiWu MinhuiWu
|
Chong Sun
|
Xiaoshuai Song
|
Jiapeng Wang
|
Zhuoma GongQue
|
Shanglin Lei
|
YiFan Zhang
|
Zhe Wei
|
Miaoxuan Zhang
|
Runfeng Qiao
|
Xiao Zong
|
Yida Xu
|
Peiqing Yang
|
Zhimin Bao
|
Muxi Diao
|
Chen Li
|
Honggang Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Visual mathematical reasoning, as a fundamental visual reasoning ability, has received widespread attention from the Large Multimodal Models (LMMs) community. Existing benchmarks mainly focus more on the end-to-end performance, but neglect the underlying principles of knowledge acquisition and generalization. Instead, we introduce WE-MATH, the first benchmark specifically designed to explore the problem-solving principles. We meticulously collect 6.5K visual math problems and decompose them into 10.9K step-level questions for evaluation, spanning 5 layers of knowledge granularity and 67 hierarchical knowledge concepts. Specifically, we decompose composite problems into sub-problems according to the required knowledge concepts and introduce a novel four-dimensional metric to hierarchically assess inherent issues in LMMs’ reasoning process. With WE-MATH, we conduct a thorough evaluation of existing LMMs in visual mathematical reasoning and provide comprehensive analysis and insight for future development. We anticipate that WE-MATH will open new pathways for advancements in visual mathematical reasoning for LMMs. Data and code are available at https://github.com/We-Math/We-Math.
Search
Fix author
Co-authors
- Zhimin Bao 1
- Muxi Diao 1
- Guanting Dong 1
- Zhuoma GongQue 1
- Shanglin Lei 1
- show all...
Venues
- acl1