EvoMD-LLM: Learning the Language of Species Evolution in Reactive Molecular Dynamics

Zhichen Tang, Zhengzheng Dang, Yulin Chen, Jixin Wu, Haiwen Li, Yanming Wang


Abstract
While large language models (LLMs) excel at static scientific reasoning, they struggle to model the temporal structure of dynamic physical processes. We present EvoMD-LLM (Evolutionary Molecular Dynamics Large Language Model), a framework that reformulates species-level molecular dynamics as a symbolic temporal language modeling problem. Reactive MD trajectories are discretized into sequences of molecular events, where each token represents a chemical species augmented with its persistence duration, enabling standard autoregressive LLMs to learn compositional evolution over time through efficient fine-tuning. A key component of EvoMD-LLM is temporal scaffolding, which treats event duration as an explicit linguistic token and serves as a structured inductive bias, significantly reducing invalid or hallucinated molecular outputs compared to conventional sequence modeling approaches. We evaluate EvoMD-LLM on multiple temporal prediction tasks, achieving up to 66.14% accuracy and consistently outperforming sequential neural networks and language-based baselines. Beyond quantitative improvements, we qualitatively observe that the model can generate plausible physical interpretations of reaction dynamics, despite not being explicitly trained for explanation. These results demonstrate that symbolic temporal language modeling provides an effective framework for grounding LLMs in dynamic physical simulations.
Anthology ID:
2026.findings-acl.1947
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
39072–39088
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1947/
DOI:
Bibkey:
Cite (ACL):
Zhichen Tang, Zhengzheng Dang, Yulin Chen, Jixin Wu, Haiwen Li, and Yanming Wang. 2026. EvoMD-LLM: Learning the Language of Species Evolution in Reactive Molecular Dynamics. In Findings of the Association for Computational Linguistics: ACL 2026, pages 39072–39088, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
EvoMD-LLM: Learning the Language of Species Evolution in Reactive Molecular Dynamics (Tang et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1947.pdf
Checklist:
 2026.findings-acl.1947.checklist.pdf