MMSciBench: Benchmarking Language Models on Chinese Multimodal Scientific Problems

Xinwu Ye, Chengfan Li, Siming Chen, Wei Wei, Robert Tang


Abstract
Recent advances in large language models (LLMs) and vision-language models (LVLMs) have shown promise across many tasks, yet their scientific reasoning capabilities remain untested, particularly in multimodal settings. We present MMSciBench, a benchmark for evaluating mathematical and physical reasoning through text-only and text-image formats, with human-annotated difficulty levels, solutions with detailed explanations, and taxonomic mappings. Evaluation of state-of-the-art models reveals significant limitations, with even the best model achieving only 63.77% accuracy and particularly struggling with visual reasoning tasks. Our analysis exposes critical gaps in complex reasoning and visual-textual integration, establishing MMSciBench as a rigorous standard for measuring progress in multimodal scientific understanding. The code for MMSciBench is open-sourced at GitHub, and the dataset is available at Hugging Face.
Anthology ID:
2025.findings-acl.755
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
14621–14663
Language:
URL:
https://preview.aclanthology.org/display_plenaries/2025.findings-acl.755/
DOI:
Bibkey:
Cite (ACL):
Xinwu Ye, Chengfan Li, Siming Chen, Wei Wei, and Robert Tang. 2025. MMSciBench: Benchmarking Language Models on Chinese Multimodal Scientific Problems. In Findings of the Association for Computational Linguistics: ACL 2025, pages 14621–14663, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
MMSciBench: Benchmarking Language Models on Chinese Multimodal Scientific Problems (Ye et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/display_plenaries/2025.findings-acl.755.pdf