Rumei Li


Improving Semantic Matching through Dependency-Enhanced Pre-trained Model with Adaptive Fusion
Jian Song | Di Liang | Rumei Li | Yuntao Li | Sirui Wang | Minlong Peng | Wei Wu | Yongxin Yu
Findings of the Association for Computational Linguistics: EMNLP 2022

Transformer-based pre-trained models like BERT have achieved great progress on Semantic Sentence Matching. Meanwhile, dependency prior knowledge has also shown general benefits in multiple NLP tasks. However, how to efficiently integrate dependency prior structure into pre-trained models to better model complex semantic matching relations is still unsettled. In this paper, we propose the Dependency-Enhanced Adaptive Fusion Attention (DAFA), which explicitly introduces dependency structure into pre-trained models and adaptively fuses it with semantic information. Specifically, (i) DAFA first proposes a structure-sensitive paradigm to construct a dependency matrix for calibrating attention weights. (ii) It adopts an adaptive fusion module to integrate the obtained dependency information and the original semantic signals. Moreover, DAFA reconstructs the attention calculation flow and provides better interpretability. By applying it on BERT, our method achieves state-of-the-art or competitive performance on 10 public datasets, demonstrating the benefits of adaptively fusing dependency structure in semantic matching task.


Large-Scale Relation Learning for Question Answering over Knowledge Bases with Pre-trained Language Models
Yuanmeng Yan | Rumei Li | Sirui Wang | Hongzhi Zhang | Zan Daoguang | Fuzheng Zhang | Wei Wu | Weiran Xu
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

The key challenge of question answering over knowledge bases (KBQA) is the inconsistency between the natural language questions and the reasoning paths in the knowledge base (KB). Recent graph-based KBQA methods are good at grasping the topological structure of the graph but often ignore the textual information carried by the nodes and edges. Meanwhile, pre-trained language models learn massive open-world knowledge from the large corpus, but it is in the natural language form and not structured. To bridge the gap between the natural language and the structured KB, we propose three relation learning tasks for BERT-based KBQA, including relation extraction, relation matching, and relation reasoning. By relation-augmented training, the model learns to align the natural language expressions to the relations in the KB as well as reason over the missing connections in the KB. Experiments on WebQSP show that our method consistently outperforms other baselines, especially when the KB is incomplete.

ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer
Yuanmeng Yan | Rumei Li | Sirui Wang | Fuzheng Zhang | Wei Wu | Weiran Xu
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Learning high-quality sentence representations benefits a wide range of natural language processing tasks. Though BERT-based pre-trained language models achieve high performance on many downstream tasks, the native derived sentence representations are proved to be collapsed and thus produce a poor performance on the semantic textual similarity (STS) tasks. In this paper, we present ConSERT, a Contrastive Framework for Self-Supervised SEntence Representation Transfer, that adopts contrastive learning to fine-tune BERT in an unsupervised and effective way. By making use of unlabeled texts, ConSERT solves the collapse issue of BERT-derived sentence representations and make them more applicable for downstream tasks. Experiments on STS datasets demonstrate that ConSERT achieves an 8% relative improvement over the previous state-of-the-art, even comparable to the supervised SBERT-NLI. And when further incorporating NLI supervision, we achieve new state-of-the-art performance on STS tasks. Moreover, ConSERT obtains comparable results with only 1000 samples available, showing its robustness in data scarcity scenarios.