MolTRES: Improving Chemical Language Representation Learning for Molecular Property Prediction

Jun-Hyung Park, Yeachan Kim, Mingyu Lee, Hyuntae Park, SangKeun Lee


Abstract
Chemical representation learning has gained increasing interest due to the limited availability of supervised data in fields such as drug and materials design. This interest particularly extends to chemical language representation learning, which involves pre-training Transformers on SMILES sequences - textual descriptors of molecules. Despite its success in molecular property prediction, current practices often lead to overfitting and limited scalability due to early convergence. In this paper, we introduce a novel chemical language representation learning framework, called MolTRES, to address these issues. MolTRES incorporates generator-discriminator training, allowing the model to learn from more challenging examples that require structural understanding. In addition, we enrich molecular representations by transferring knowledge from scientific literature by integrating external materials embedding. Experimental results show that our model outperforms existing state-of-the-art models on popular molecular property prediction tasks.
Anthology ID:
2024.emnlp-main.788
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
14241–14254
Language:
URL:
https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.emnlp-main.788/
DOI:
10.18653/v1/2024.emnlp-main.788
Bibkey:
Cite (ACL):
Jun-Hyung Park, Yeachan Kim, Mingyu Lee, Hyuntae Park, and SangKeun Lee. 2024. MolTRES: Improving Chemical Language Representation Learning for Molecular Property Prediction. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 14241–14254, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
MolTRES: Improving Chemical Language Representation Learning for Molecular Property Prediction (Park et al., EMNLP 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.emnlp-main.788.pdf