UniMath-CoT: A Unified Framework for Multimodal Mathematical Reasoning with Re-Inference Affirmation

Zhixiang Lu, Mian Zhou, Angelos Stefanidis, Jionglong Su


Abstract
Large Language Models (LLMs) have achieved considerable success in text-based mathematical reasoning, yet their potential remains underexplored in the multimodal mathematics domain where joint text and image understanding is imperative. A key bottleneck hindering progress is the scarcity of high-quality, genuinely multimodal benchmarks. To address this gap, we construct a unified benchmark by consolidating and curating three public multimodal mathematics datasets. We subsequently propose the UniMath-CoT framework, which establishes a robust performance baseline by combining Chain-of-Thought (CoT) principles with efficient Supervised Fine-Tuning (SFT) based on Low-Rank Adaptation (LoRA). Furthermore, to bolster the model’s reasoning robustness, we introduce an innovative verification mechanism, AARI (Answer Affirmation by Re-Inference), which leverages a specialized re-inference protocol to have the model self-scrutinize and validate its initial conclusions. Our comprehensive experiments show that this integrated strategy substantially boosts performance, surpassing a wide range of open-source models and markedly closing the gap with leading proprietary systems.
Anthology ID:
2025.mathnlp-main.13
Volume:
Proceedings of The 3rd Workshop on Mathematical Natural Language Processing (MathNLP 2025)
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Marco Valentino, Deborah Ferreira, Mokanarangan Thayaparan, Leonardo Ranaldi, Andre Freitas
Venues:
MathNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
176–185
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.mathnlp-main.13/
DOI:
Bibkey:
Cite (ACL):
Zhixiang Lu, Mian Zhou, Angelos Stefanidis, and Jionglong Su. 2025. UniMath-CoT: A Unified Framework for Multimodal Mathematical Reasoning with Re-Inference Affirmation. In Proceedings of The 3rd Workshop on Mathematical Natural Language Processing (MathNLP 2025), pages 176–185, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
UniMath-CoT: A Unified Framework for Multimodal Mathematical Reasoning with Re-Inference Affirmation (Lu et al., MathNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.mathnlp-main.13.pdf