Yunqing Liu
2025
GLProtein: Global-and-Local Structure Aware Protein Representation Learning
Yunqing Liu
|
Wenqi Fan
|
Xiaoyong Wei
|
Li Qing
Findings of the Association for Computational Linguistics: EMNLP 2025
Proteins are central to biological systems, participating as building blocks across all forms of life. Despite advancements in understanding protein functions through protein sequence analysis, there remains potential for further exploration in integrating protein structural information. We argue that the structural information of proteins is not only limited to their 3D information but also encompasses information from amino acid molecules (local information) to protein-protein structure similarity (global information). To address this, we propose GLProtein, the first framework in protein pre-training that incorporates both global structural similarity and local amino acid details to enhance prediction accuracy and functional insights. GLProtein innovatively combines protein-masked modelling with triplet structure similarity scoring, protein 3D distance encoding and substructure-based amino acid molecule encoding. Experimental results demonstrate that GLProtein outperforms previous methods in several bioinformatics tasks, including predicting protein-protein interactions, contact prediction, and so on.
2023
Improving User Controlled Table-To-Text Generation Robustness
Hanxu Hu
|
Yunqing Liu
|
Zhongyi Yu
|
Laura Perez-Beltrachini
Findings of the Association for Computational Linguistics: EACL 2023
In this work we study user controlled table-to-text generation where users explore the content in a table by selecting cells and reading a natural language description thereof automatically produce by a natural language generator. Such generation models usually learn from carefully selected cell combinations (clean cell selections); however, in practice users may select unexpected, redundant, or incoherent cell combinations (noisy cell selections). In experiments, we find that models perform well on test sets coming from the same distribution as the train data but their performance drops when evaluated on realistic noisy user inputs. We propose a fine-tuning regime with additional user-simulated noisy cell selections. Models fine-tuned with the proposed regime gain 4.85 BLEU points on user noisy test cases and 1.4 on clean test cases; and achieve comparable state-of-the-art performance on the ToTTo dataset.
Search
Fix author
Co-authors
- Wenqi Fan 1
- Hanxu Hu 1
- Laura Perez-Beltrachini 1
- Li Qing 1
- Xiaoyong Wei 1
- show all...