MMSciBench: Benchmarking Language Models on Chinese Multimodal Scientific Problems

Xinwu Ye; Chengfan Li; Siming Chen; Wei Wei; Robert Tang

doi:10.18653/v1/2025.findings-acl.755

MMSciBench: Benchmarking Language Models on Chinese Multimodal Scientific Problems

Xinwu Ye, Chengfan Li, Siming Chen, Wei Wei, Robert Tang

Abstract

Recent advances in large language models (LLMs) and vision-language models (LVLMs) have shown promise across many tasks, yet their scientific reasoning capabilities remain untested, particularly in multimodal settings. We present MMSciBench, a benchmark for evaluating mathematical and physical reasoning through text-only and text-image formats, with human-annotated difficulty levels, solutions with detailed explanations, and taxonomic mappings. Evaluation of state-of-the-art models reveals significant limitations, with even the best model achieving only 63.77% accuracy and particularly struggling with visual reasoning tasks. Our analysis exposes critical gaps in complex reasoning and visual-textual integration, establishing MMSciBench as a rigorous standard for measuring progress in multimodal scientific understanding. The code for MMSciBench is open-sourced at GitHub, and the dataset is available at Hugging Face.

Anthology ID:: 2025.findings-acl.755
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 14621–14663
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.findings-acl.755/
DOI:: 10.18653/v1/2025.findings-acl.755
Bibkey:
Cite (ACL):: Xinwu Ye, Chengfan Li, Siming Chen, Wei Wei, and Robert Tang. 2025. MMSciBench: Benchmarking Language Models on Chinese Multimodal Scientific Problems. In Findings of the Association for Computational Linguistics: ACL 2025, pages 14621–14663, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: MMSciBench: Benchmarking Language Models on Chinese Multimodal Scientific Problems (Ye et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.findings-acl.755.pdf

PDF Cite Search Fix data