Hao Duan
2026
MCLE-Mol: Empowering LLM with Molecular Comprehension and Low-Cost Continual Evolution for Interpretable Property Prediction
Zhili Pu | Lantian Zhang | Hao Duan | Zhixing Zhang | Keyun Zhu | Yongqi Fan | Ruihui Hou | Tong Ruan | Yun Tang
Findings of the Association for Computational Linguistics: ACL 2026
Zhili Pu | Lantian Zhang | Hao Duan | Zhixing Zhang | Keyun Zhu | Yongqi Fan | Ruihui Hou | Tong Ruan | Yun Tang
Findings of the Association for Computational Linguistics: ACL 2026
Large language models (LLMs) offer a new paradigm for molecular property prediction (MPP), yet a semantic gap between natural language and molecular representations limits LLMs’ ability to capture structure–activity relationships (SAR). Recent approaches have explored injecting structure-level information into LLMs, primarily modeling associations based on statistical regularities. However, these methods are prone to misinterpreting coincidental associations as general principles, imposing a bottleneck on predictive performance. To tackle the challenges above, we propose MCLE-Mol, an ML–LLM–Rule collaborative framework for MPP. It bridges the semantic gap by injecting ML-derived substructure attribution values into LLMs, utilizing Context-Calibrated Substructure Attribution Rules (CCSAR) to calibrate these attributions under specific chemical contexts to mitigate such misinterpretation. In addition, MCLE-Mol introduces a low-cost continual evolution strategy that updates CCSAR with frozen model parameters to adapt to dynamic chemical spaces. Experiments on multiple benchmark datasets demonstrate that MCLE-Mol outperforms all baselines, successfully resolving the trade-off between predictive performance and interpretability.