Hongtao Liu


2023

pdf
Pre-trained Personalized Review Summarization with Effective Salience Estimation
Hongyan Xu | Hongtao Liu | Zhepeng Lv | Qing Yang | Wenjun Wang
Findings of the Association for Computational Linguistics: ACL 2023

Personalized review summarization in recommender systems is a challenging task of generating condensed summaries for product reviews while preserving the salient content of reviews. Recently, Pretrained Language Models (PLMs) have become a new paradigm in text generation for the strong ability of natural language comprehension. However, it is nontrivial to apply PLMs in personalized review summarization directly since there are rich personalized information (e.g., user preferences and product characteristics) to be considered, which is crucial to the salience estimation of input review. In this paper, we propose a pre-trained personalized review summarization method, which aims to effectively incorporate the personalized information of users and products into the salience estimation of the input reviews. We design a personalized encoder that could identify the salient contents of the input sequence by jointly considering the semantic and personalized information respectively (i.e., ratings, user and product IDs, and linguistic features), yielding personalized representations for the input reviews and history summaries separately. Moreover, we design an interactive information selection mechanism that further identifies the salient contents of the input reviews and selects relative information from the history summaries. The results on real-world datasets show that our method performs better than the state-of-the-art baselines and could generate more readable summaries.

pdf
PUNR: Pre-training with User Behavior Modeling for News Recommendation
Guangyuan Ma | Hongtao Liu | Xing W | Wanhui Qian | Zhepeng Lv | Qing Yang | Songlin Hu
Findings of the Association for Computational Linguistics: EMNLP 2023

News recommendation aims to predict click behaviors based on user behaviors. How to effectively model the user representations is the key to recommending preferred news. Existing works are mostly focused on improvements in the supervised fine-tuning stage. However, there is still a lack of PLM-based unsupervised pre-training methods optimized for user representations. In this work, we propose an unsupervised pre-training paradigm with two tasks, i.e. user behavior masking and user behavior generation, both towards effective user behavior modeling. Firstly, we introduce the user behavior masking pre-training task to recover the masked user behaviors based on their contextual behaviors. In this way, the model could capture a much stronger and more comprehensive user news reading pattern. Besides, we incorporate a novel auxiliary user behavior generation pre-training task to enhance the user representation vector derived from the user encoder. We use the above pre-trained user modeling encoder to obtain news and user representations in downstream fine-tuning. Evaluations on the real-world news benchmark show significant performance improvements over existing baselines.

pdf
Contrastive Pre-training for Personalized Expert Finding
Qiyao Peng | Hongtao Liu | Zhepeng Lv | Qing Yang | Wenjun Wang
Findings of the Association for Computational Linguistics: EMNLP 2023

Expert finding could help route questions to potential suitable users to answer in Community Question Answering (CQA) platforms. Hence it is essential to learn accurate representations of experts and questions according to the question text articles. Recently the pre-training and fine-tuning paradigms are powerful for natural language understanding, which has the potential for better question modeling and expert finding. Inspired by this, we propose a CQA-domain Contrastive Pre-training framework for Expert Finding, named CPEF, which could learn more comprehensive question representations. Specifically, considering that there is semantic complementation between question titles and bodies, during the domain pre-training phase, we propose a title-body contrastive learning task to enhance question representations, which directly treats the question title and the corresponding body as positive samples of each other, instead of designing extra data-augmentation strategies. Furthermore, a personalized tuning network is proposed to inject the personalized preferences of different experts during the fine-tuning phase. Extensive experimental results on six real-world datasets demonstrate that our method could achieve superior performance for expert finding.

2022

pdf
ExpertPLM: Pre-training Expert Representation for Expert Finding
Qiyao Peng | Hongtao Liu
Findings of the Association for Computational Linguistics: EMNLP 2022

Expert Finding is an important task in Community Question Answering (CQA) platforms, which could help route questions to potential users to answer. The key is to learn representations of experts based on their historical answered questions accurately. In this paper, inspired by the strong text understanding ability of Pretrained Language modelings (PLMs), we propose a pre-training and fine-tuning expert finding framework. The core is that we design an expert-level pre-training paradigm, that effectively integrates expert interest and expertise simultaneously. Specifically different from the typical corpus-level pre-training, we treat each expert as the basic pre-training unit including all the historical answered question titles of the expert, which could fully indicate the expert interests for questions. Besides, we integrate the vote score information along with each answer of the expert into the pre-training phrase to model the expert ability explicitly. Finally, we propose a novel reputation-augmented Masked Language Model (MLM) pre-training strategy to capture the expert reputation information. In this way, our method could learn expert representation comprehensively, which then will be adopted and fine-tuned in the down-streaming expert-finding task. Extensive experimental results on six real-world CQA datasets demonstrate the effectiveness of our method.