VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning

Jingkun Ma; Runzhe Zhan (詹润哲); Yang Li; Di Sun; Hou Pong Chan; Lidia S. Chao; Derek F. Wong (黄辉)

VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning

Jingkun Ma, Runzhe Zhan, Yang Li, Di Sun, Hou Pong Chan, Lidia S. Chao, Derek F. Wong

Abstract

A hallmark of advanced artificial intelligence is the capacity to progress from passive visual perception to the strategic modification of visual information to facilitate complex reasoning. This advanced capability, however, remains critically underdeveloped in current Large Multi-modal Models (LMMs). The deficiency is often masked by evaluation metrics that prioritize final-answer accuracy, creating an illusion of competence where genuine reasoning is absent. Using the domain of geometric problem-solving as a precise instrument, we probe this issue through tasks that require constructing visual aids.To this end, we introduce VisAidMath, a challenging benchmark, and our novel Three-Layered Funnel Evaluation Framework. This framework moves beyond simple accuracy (ACCU) to scrutinize the generation of valid visual aids (PVA) and the soundness of subsequent reasoning steps (SPRS). Our extensive experiments on state-of-the-art models, including Doubao-Seed-1.6 and o4, reveal a profound “Reasoning Illusion”. We observe that high surface-level accuracy conceals a catastrophic failure in the models’ ability to produce valid visual aids or to reason from them. Our findings expose a fundamental schism between visual perception and logical deduction in modern LMMs. We provide a public evaluation platform on CodaBench and release the project homepage.

Anthology ID:: 2026.acl-long.1719
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 37057–37103
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1719/
DOI:
Bibkey:
Cite (ACL):: Jingkun Ma, Runzhe Zhan, Yang Li, Di Sun, Hou Pong Chan, Lidia S. Chao, and Derek F. Wong. 2026. VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 37057–37103, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning (Ma et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1719.pdf
Checklist:: 2026.acl-long.1719.checklist.pdf

PDF Cite Search Checklist Fix data