Weidong Guo


2022

pdf
MatRank: Text Re-ranking by Latent Preference Matrix
Jinwen Luo | Jiuding Yang | Weidong Guo | Chenglin Li | Di Niu | Yu Xu
Findings of the Association for Computational Linguistics: EMNLP 2022

Text ranking plays a key role in providing content that best answers user queries. It is usually divided into two sub-tasks to perform efficient information retrieval given a query: text retrieval and text re-ranking. Recent research on pretrained language models (PLM) has demonstrated efficiency and gain on both sub-tasks. However, while existing methods have benefited from pre-trained language models and achieved high recall rates on passage retrieval, the ranking performance still demands further improvement. In this paper, we propose MatRank, which learns to re-rank the text retrieved for a given query by learning to predict the most relevant passage based on a latent preference matrix. Specifically, MatRank uses a PLM to generate an asymmetric latent matrix of relative preference scores between all pairs of retrieved passages. Then, the latent matrix is aggregated row-wise and column-wise to obtain global preferences and predictions of the most relevant passage in two of these directions, respectively. We conduct extensive experiments on MS MACRO, WikiAQ, and SemEval datasets. Experimental results show that MatRank has achieved new state-of-the-art results on these datasets, outperforming all prior methods on ranking performance metrics.

pdf
Contrastive Learning enhanced Author-Style Headline Generation
Hui Liu | Weidong Guo | Yige Chen | Xiangyang Li
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Headline generation is a task of generating an appropriate headline for a given article, which can be further used for machine-aided writing or enhancing the click-through ratio. Current works only use the article itself in the generation, but have not taken the writing style of headlines into consideration. In this paper, we propose a novel Seq2Seq model called CLH3G (Contrastive Learning enhanced Historical Headlines based Headline Generation) which can use the historical headlines of the articles that the author wrote in the past to improve the headline generation of current articles. By taking historical headlines into account, we can integrate the stylistic features of the author into our model, and generate a headline not only appropriate for the article, but also consistent with the author’s style. In order to efficiently learn the stylistic features of the author, we further introduce a contrastive learning based auxiliary task for the encoder of our model. Besides, we propose two methods to use the learned stylistic features to guide both the pointer and the decoder during the generation. Experimental results show that historical headlines of the same user can improve the headline generation significantly, and both the contrastive learning module and the two style features fusion methods can further boost the performance.

2021

pdf
LICHEE: Improving Language Model Pre-training with Multi-grained Tokenization
Weidong Guo | Mingjun Zhao | Lusheng Zhang | Di Niu | Jinwen Luo | Zhenhua Liu | Zhenyang Li | Jianbo Tang
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021