Yunqiao Yang
2025
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning
Ke Wang
|
Junting Pan
|
Linda Wei
|
Aojun Zhou
|
Weikang Shi
|
Zimu Lu
|
Han Xiao
|
Yunqiao Yang
|
Houxing Ren
|
Mingjie Zhan
|
Hongsheng Li
Findings of the Association for Computational Linguistics: ACL 2025
Natural language image-caption datasets, widely used for training Large Multimodal Models, mainly focus on natural scenarios and overlook the intricate details of mathematical figures that are critical for problem-solving, hindering the advancement of current LMMs in multimodal mathematical reasoning. To this end, we propose leveraging code as supervision for cross-modal alignment, since code inherently encodes all information needed to generate corresponding figures, establishing a precise connection between the two modalities. Specifically, we co-develop our image-to-code model and dataset with model-in-the-loop approach, resulting in an image-to-code model, FigCodifier and ImgCode-8.6M dataset, the largest image-code dataset to date. Furthermore, we utilize FigCodifier to synthesize novel mathematical figures and then construct MM-MathInstruct-3M, a high-quality multimodal math instruction fine-tuning dataset. Finally, we present MathCoder-VL, trained with ImgCode-8.6M for cross-modal alignment and subsequently fine-tuned on MM-MathInstruct-3M for multimodal math problem solving. Our model achieves a new open-source SOTA across all six metrics. Notably, it surpasses GPT-4o and Claude 3.5 Sonnet in the geometry problem-solving subset of MathVista, achieving improvements of 8.9% and 9.2%.
Probability-Consistent Preference Optimization for Enhanced LLM Reasoning
Yunqiao Yang
|
Houxing Ren
|
Zimu Lu
|
Ke Wang
|
Weikang Shi
|
Aojun Zhou
|
Junting Pan
|
Mingjie Zhan
|
Hongsheng Li
Findings of the Association for Computational Linguistics: ACL 2025
Recent advances in preference optimization have demonstrated significant potential for improving mathematical reasoning capabilities in large language models (LLMs). While current approaches leverage high-quality pairwise preference data through outcome-based criteria like answer correctness or consistency, they fundamentally neglect the internal logical coherence of responses. To overcome this, we propose Probability-Consistent Preference Optimization (PCPO), a novel framework that establishes dual quantitative metrics for preference selection: (1) surface-level answer correctness and (2) intrinsic token-level probability consistency across responses. Extensive experiments show that our PCPO consistently outperforms existing outcome-only criterion approaches across a diverse range of LLMs and benchmarks. Our code is publicly available at https://github.com/YunqiaoYang/PCPO.
Search
Fix author
Co-authors
- Hongsheng Li 2
- Zimu Lu 2
- Junting Pan 2
- Houxing Ren 2
- Weikang Shi 2
- show all...