GLProtein: Global-and-Local Structure Aware Protein Representation Learning

Yunqing Liu, Wenqi Fan, Xiaoyong Wei, Li Qing


Abstract
Proteins are central to biological systems, participating as building blocks across all forms of life. Despite advancements in understanding protein functions through protein sequence analysis, there remains potential for further exploration in integrating protein structural information. We argue that the structural information of proteins is not only limited to their 3D information but also encompasses information from amino acid molecules (local information) to protein-protein structure similarity (global information). To address this, we propose GLProtein, the first framework in protein pre-training that incorporates both global structural similarity and local amino acid details to enhance prediction accuracy and functional insights. GLProtein innovatively combines protein-masked modelling with triplet structure similarity scoring, protein 3D distance encoding and substructure-based amino acid molecule encoding. Experimental results demonstrate that GLProtein outperforms previous methods in several bioinformatics tasks, including predicting protein-protein interactions, contact prediction, and so on.
Anthology ID:
2025.findings-emnlp.233
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4355–4372
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.233/
DOI:
10.18653/v1/2025.findings-emnlp.233
Bibkey:
Cite (ACL):
Yunqing Liu, Wenqi Fan, Xiaoyong Wei, and Li Qing. 2025. GLProtein: Global-and-Local Structure Aware Protein Representation Learning. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 4355–4372, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
GLProtein: Global-and-Local Structure Aware Protein Representation Learning (Liu et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.233.pdf
Checklist:
 2025.findings-emnlp.233.checklist.pdf