CAGenMol: Condition-Aware Diffusion Language Model for Goal-Directed Molecular Generation

Yanting Li, Zhuoyang Jiang, Enyan Dai, Lei Wang, Wen-Cai Ye, Li Liu


Abstract
Goal-directed molecular generation requires satisfying heterogeneous constraints such as protein–ligand compatibility and multi-objective drug-like properties, yet existing methods often optimize these constraints in isolation, failing to reconcile conflicting objectives (e.g., affinity vs. safety), and struggle to navigate the non-differentiable chemical space without compromising structural validity. To address these challenges, we propose CAGenMol, a condition-aware discrete diffusion framework over molecular sequences that formulates molecular design as conditional denoising guided by heterogeneous structural and property signals. By coupling discrete diffusion with reinforcement learning, the model aligns the generation trajectory with non-differentiable objectives while preserving chemical validity and diversity. The non-autoregressive nature of diffusion language model further enables iterative refinement of molecular fragments at inference time. Experiments on structure-conditioned, property-conditioned, and dual-conditioned benchmarks demonstrate consistent improvements over state-of-the-art methods in binding affinity, drug-likeness, and success rate, highlighting the effectiveness of our framework. The code is available at https://github.com/Lee612-1/CAGenMol.
Anthology ID:
2026.findings-acl.232
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4720–4738
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.232/
DOI:
Bibkey:
Cite (ACL):
Yanting Li, Zhuoyang Jiang, Enyan Dai, Lei Wang, Wen-Cai Ye, and Li Liu. 2026. CAGenMol: Condition-Aware Diffusion Language Model for Goal-Directed Molecular Generation. In Findings of the Association for Computational Linguistics: ACL 2026, pages 4720–4738, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
CAGenMol: Condition-Aware Diffusion Language Model for Goal-Directed Molecular Generation (Li et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.232.pdf
Checklist:
 2026.findings-acl.232.checklist.pdf