\mathtt{GeLLM^3O}: Generalizing Large Language Models for Multi-property Molecule Optimization

Vishal Dey, Xiao Hu, Xia Ning


Abstract
Despite recent advancements, most computational methods for molecule optimization are constrained to single- or double-property optimization tasks and suffer from poor scalability and generalizability to novel optimization tasks. Meanwhile, Large Language Models (LLMs) demonstrate remarkable out-of-domain generalizability to novel tasks. To demonstrate LLMs’ potential for molecule optimization, we introduce \mathtt{MuMOInstruct}, the first high-quality instruction-tuning dataset specifically focused on multi-property molecule optimization tasks. Leveraging \mathtt{MuMOInstruct}, we develop \mathtt{GeLLM^3O}s, a series of instruction-tuned LLMs for molecule optimization. Extensive evaluations across 5 in-domain and 5 out-of-domain tasks demonstrate that \mathtt{GeLLM^3O}s consistently outperform state-of-the-art baselines. \mathtt{GeLLM^3O}s also exhibit outstanding zero-shot generalization to unseen tasks, significantly outperforming powerful closed-source LLMs. Such strong generalizability demonstrates the tremendous potential of \mathtt{GeLLM^3O}s as foundational models for molecule optimization, thereby tackling novel optimization tasks without resource-intensive retraining. \mathtt{MuMOInstruct} and code are accessible through https://github.com/ninglab/GeLLMO.
Anthology ID:
2025.acl-long.1225
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
25192–25221
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1225/
DOI:
Bibkey:
Cite (ACL):
Vishal Dey, Xiao Hu, and Xia Ning. 2025. \mathtt{GeLLM^3O}: Generalizing Large Language Models for Multi-property Molecule Optimization. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 25192–25221, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
\mathtt{GeLLM^3O}: Generalizing Large Language Models for Multi-property Molecule Optimization (Dey et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1225.pdf