CheMM-R1: Enhancing Chemical Structure Recognition and Elucidation with Reasoning Multimodal Large Language Models

Liting Huang, Zhihao Zhang, Shoujin Wang


Abstract
While Multimodal Large Language Models (MLLMs) demonstrate strong reasoning capabilities, they lack domain-specific expertise to effectively perform chemical tasks. For example, existing MLLMs struggle with both the lower-level task of molecular structure recognition and the higher-level task of chemical spectral data elucidation. When faced with complex molecular structures and multimodal chemical data (including spectral images and texts), they often fail to provide reliable inference, resulting in poor performance. Moreover, there are no benchmark datasets for evaluating multi-step multimodal reasoning capacities in the chemistry domain. To this end, we establish CheMM-Bench, a comprehensive benchmark dataset with 48,500 reasoning steps across four chemical tasks (SmilesQA, IupacQA, MwQA, SpectraQA) for evaluating visual reasoning in both molecular structure recognition and spectral analysis. On top of this, we present CheMM-R1, a state-of-the-art chemistry-specific MLLM trained with CheMMGRPO, a novel adaptation of Group Relative Policy Optimisation tailored for chemical reasoning. CheMMGRPO employs domain-specific reward functions to assess chemical validity, structural accuracy, format compliance, and factual correctness. CheMM-R1 surpasses leading proprietary models (GPT-o3, Gemini-2.5-Pro, Claude-3.5-Sonnet, and Grok-2) across all CheMM-Bench tasks. The evaluation code and model are publicly available.
Anthology ID:
2026.findings-acl.1341
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
26902–26921
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1341/
DOI:
Bibkey:
Cite (ACL):
Liting Huang, Zhihao Zhang, and Shoujin Wang. 2026. CheMM-R1: Enhancing Chemical Structure Recognition and Elucidation with Reasoning Multimodal Large Language Models. In Findings of the Association for Computational Linguistics: ACL 2026, pages 26902–26921, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
CheMM-R1: Enhancing Chemical Structure Recognition and Elucidation with Reasoning Multimodal Large Language Models (Huang et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1341.pdf
Checklist:
 2026.findings-acl.1341.checklist.pdf