Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning

Yibo Yan, Shen Wang, Jiahao Huo, Jingheng Ye, Zhendong Chu, Xuming Hu, Philip S. Yu, Carla P Gomes, Bart Selman, Qingsong Wen


Abstract
Scientific reasoning, the process through which humans apply logic, evidence, and critical thinking to explore and interpret scientific phenomena, is essential in advancing knowledge reasoning across diverse fields. However, despite significant progress, current scientific reasoning models still struggle with generalization across domains and often fall short of multimodal perception. Multimodal Large Language Models (MLLMs), which integrate text, images, and other modalities, present an exciting opportunity to overcome these limitations and enhance scientific reasoning. Therefore, **this position paper argues that MLLMs can significantly advance scientific reasoning across disciplines such as mathematics, physics, chemistry, and biology**. We highlight the current state of MLLM applications in scientific reasoning, noting their ability to integrate and reason over diverse data types. However, challenges such as multimodal alignment, data diversity, and reasoning depth remain obstacles to achieving their full potential. To address these challenges, we propose actionable suggestions in the near future. Overall, our work offers a novel perspective on MLLM integration with scientific reasoning, providing the LLM community with valuable insights for achieving Artificial General Intelligence (AGI).
Anthology ID:
2026.findings-acl.1228
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
24535–24574
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1228/
DOI:
Bibkey:
Cite (ACL):
Yibo Yan, Shen Wang, Jiahao Huo, Jingheng Ye, Zhendong Chu, Xuming Hu, Philip S. Yu, Carla P Gomes, Bart Selman, and Qingsong Wen. 2026. Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning. In Findings of the Association for Computational Linguistics: ACL 2026, pages 24535–24574, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning (Yan et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1228.pdf
Checklist:
 2026.findings-acl.1228.checklist.pdf