Di Niu


2022

pdf
MatRank: Text Re-ranking by Latent Preference Matrix
Jinwen Luo | Jiuding Yang | Weidong Guo | Chenglin Li | Di Niu | Yu Xu
Findings of the Association for Computational Linguistics: EMNLP 2022

Text ranking plays a key role in providing content that best answers user queries. It is usually divided into two sub-tasks to perform efficient information retrieval given a query: text retrieval and text re-ranking. Recent research on pretrained language models (PLM) has demonstrated efficiency and gain on both sub-tasks. However, while existing methods have benefited from pre-trained language models and achieved high recall rates on passage retrieval, the ranking performance still demands further improvement. In this paper, we propose MatRank, which learns to re-rank the text retrieved for a given query by learning to predict the most relevant passage based on a latent preference matrix. Specifically, MatRank uses a PLM to generate an asymmetric latent matrix of relative preference scores between all pairs of retrieved passages. Then, the latent matrix is aggregated row-wise and column-wise to obtain global preferences and predictions of the most relevant passage in two of these directions, respectively. We conduct extensive experiments on MS MACRO, WikiAQ, and SemEval datasets. Experimental results show that MatRank has achieved new state-of-the-art results on these datasets, outperforming all prior methods on ranking performance metrics.

2021

pdf
LICHEE: Improving Language Model Pre-training with Multi-grained Tokenization
Weidong Guo | Mingjun Zhao | Lusheng Zhang | Di Niu | Jinwen Luo | Zhenhua Liu | Zhenyang Li | Jianbo Tang
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2019

pdf
Matching Article Pairs with Graphical Decomposition and Convolutions
Bang Liu | Di Niu | Haojie Wei | Jinghong Lin | Yancheng He | Kunfeng Lai | Yu Xu
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Identifying the relationship between two articles, e.g., whether two articles published from different sources describe the same breaking news, is critical to many document understanding tasks. Existing approaches for modeling and matching sentence pairs do not perform well in matching longer documents, which embody more complex interactions between the enclosed entities than a sentence does. To model article pairs, we propose the Concept Interaction Graph to represent an article as a graph of concepts. We then match a pair of articles by comparing the sentences that enclose the same concept vertex through a series of encoding techniques, and aggregate the matching signals through a graph convolutional network. To facilitate the evaluation of long article matching, we have created two datasets, each consisting of about 30K pairs of breaking news articles covering diverse topics in the open domain. Extensive evaluations of the proposed methods on the two datasets demonstrate significant improvements over a wide range of state-of-the-art methods for natural language matching.