FACT: Functional Group Alignment and Consistency in Token Space for Structure-aware Molecular Representation Learning

Hyeonyeong Nam, Woojae Choi, Deok-Joong Lee, Young-Han Son, Sangwoon Lee, Bogyeong Kang, Eunjung Jo, Tae-Eui Kam


Abstract
Molecular representation learning aims to capture chemically meaningful structures for various downstream tasks such as accurate molecular property prediction. However, incorporating functional group (FG) information into SMILES-based models remains challenging. The absence of explicit alignment between graph-defined FG atom sets and tokens in sequence prevents complete substructure masking, while multiple valid SMILES forms of the same molecule lead to inconsistent FG representations in token space. To address these challenges, we propose FACT (Functional Group Alignment and Consistency in Token Space), an end-to-end framework for structure-aware SMILES-based representation learning. FACT introduces an atom?token alignment module for complete FG span masking during pre-training and enforces FG consistency across different SMILES forms during fine-tuning. Experiments on MoleculeNet benchmarks show that FACT achieves state-of-the-art or competitive performance on eight tasks, demonstrating the effectiveness of alignment and consistency learning for molecular representation.
Anthology ID:
2026.bionlp-1.56
Volume:
BioNLP 2026
Month:
July
Year:
2026
Address:
San Diego, California
Editors:
Dina Demner-Fushman, Sophia Ananiadou, Kirk Roberts, Junichi Tsujii
Venues:
BioNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
695–703
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.bionlp-1.56/
DOI:
Bibkey:
Cite (ACL):
Hyeonyeong Nam, Woojae Choi, Deok-Joong Lee, Young-Han Son, Sangwoon Lee, Bogyeong Kang, Eunjung Jo, and Tae-Eui Kam. 2026. FACT: Functional Group Alignment and Consistency in Token Space for Structure-aware Molecular Representation Learning. In BioNLP 2026, pages 695–703, San Diego, California. Association for Computational Linguistics.
Cite (Informal):
FACT: Functional Group Alignment and Consistency in Token Space for Structure-aware Molecular Representation Learning (Nam et al., BioNLP 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.bionlp-1.56.pdf