Zheng Liu


2022

pdf
RetroMAE: Pre-Training Retrieval-oriented Language Models Via Masked Auto-Encoder
Shitao Xiao | Zheng Liu | Yingxia Shao | Zhao Cao
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Despite pre-training’s progress in many important NLP tasks, it remains to explore effective pre-training strategies for dense retrieval. In this paper, we propose RetroMAE, a new retrieval oriented pre-training paradigm based on Masked Auto-Encoder (MAE). RetroMAE is highlighted by three critical designs. 1) A novel MAE workflow, where the input sentence is polluted for encoder and decoder with different masks. The sentence embedding is generated from the encoder’s masked input; then, the original sentence is recovered based on the sentence embedding and the decoder’s masked input via masked language modeling. 2) Asymmetric model structure, with a full-scale BERT like transformer as encoder, and a one-layer transformer as decoder. 3) Asymmetric masking ratios, with a moderate ratio for encoder: 15 30%, and an aggressive ratio for decoder: 50 70%. Our framework is simple to realize and empirically competitive: the pre-trained models dramatically improve the SOTA performances on a wide range of dense retrieval benchmarks, like BEIR and MS MARCO. The source code and pre-trained models are made publicly available at https://github.com/staoxiao/RetroMAE so as to inspire more interesting research.

pdf
Coarse-to-Fine: Hierarchical Multi-task Learning for Natural Language Understanding
Zhaoye Fei | Yu Tian | Yongkang Wu | Xinyu Zhang | Yutao Zhu | Zheng Liu | Jiawen Wu | Dejiang Kong | Ruofei Lai | Zhao Cao | Zhicheng Dou | Xipeng Qiu
Proceedings of the 29th International Conference on Computational Linguistics

Generalized text representations are the foundation of many natural language understanding tasks. To fully utilize the different corpus, it is inevitable that models need to understand the relevance among them. However, many methods ignore the relevance and adopt a single-channel model (a coarse paradigm) directly for all tasks, which lacks enough rationality and interpretation. In addition, some existing works learn downstream tasks by stitches skill block (a fine paradigm), which might cause irrational results due to its redundancy and noise. In this work, we first analyze the task correlation through three different perspectives, , data property, manual design, and model-based relevance, based on which the similar tasks are grouped together. Then, we propose a hierarchical framework with a coarse-to-fine paradigm, with the bottom level shared to all the tasks, the mid-level divided to different groups, and the top-level assigned to each of the tasks. This allows our model to learn basic language properties from all tasks, boost performance on relevant tasks, and reduce the negative impact from irrelevant tasks. Our experiments on 13 benchmark datasets across five natural language understanding tasks demonstrate the superiority of our method.

pdf
Towards Generalizeable Semantic Product Search by Text Similarity Pre-training on Search Click Logs
Zheng Liu | Wei Zhang | Yan Chen | Weiyi Sun | Tianchuan Du | Benjamin Schroeder
Proceedings of the Fifth Workshop on e-Commerce and NLP (ECNLP 5)

Recently, semantic search has been successfully applied to E-commerce product search and the learned semantic space for query and product encoding are expected to generalize well to unseen queries or products. Yet, whether generalization can conveniently emerge has not been thoroughly studied in the domain thus far. In this paper, we examine several general-domain and domain-specific pre-trained Roberta variants and discover that general-domain fine-tuning does not really help generalization which aligns with the discovery of prior art, yet proper domain-specific fine-tuning with clickstream data can lead to better model generalization, based on a bucketed analysis of a manually annotated query-product relevance data.

2021

pdf
Matching-oriented Embedding Quantization For Ad-hoc Retrieval
Shitao Xiao | Zheng Liu | Yingxia Shao | Defu Lian | Xing Xie
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Product quantization (PQ) is a widely used technique for ad-hoc retrieval. Recent studies propose supervised PQ, where the embedding and quantization models can be jointly trained with supervised learning. However, there is a lack of appropriate formulation of the joint training objective; thus, the improvements over previous non-supervised baselines are limited in reality. In this work, we propose the Matching-oriented Product Quantization (MoPQ), where a novel objective Multinoulli Contrastive Loss (MCL) is formulated. With the minimization of MCL, we are able to maximize the matching probability of query and ground-truth key, which contributes to the optimal retrieval accuracy. Given that the exact computation of MCL is intractable due to the demand of vast contrastive samples, we further propose the Differentiable Cross-device Sampling (DCS), which significantly augments the contrastive samples for precise approximation of MCL. We conduct extensive experimental studies on four real-world datasets, whose results verify the effectiveness of MoPQ. The code is available at https://github.com/microsoft/MoPQ.

pdf
Leveraging Bidding Graphs for Advertiser-Aware Relevance Modeling in Sponsored Search
Shuxian Bi | Chaozhuo Li | Xiao Han | Zheng Liu | Xing Xie | Haizhen Huang | Zengxuan Wen
Findings of the Association for Computational Linguistics: EMNLP 2021

Recently, sponsored search has become one of the most lucrative channels for marketing. As the fundamental basis of sponsored search, relevance modeling has attracted increasing attention due to the tremendous practical value. Most existing methods solely rely on the query-keyword pairs. However, keywords are usually short texts with scarce semantic information, which may not precisely reflect the underlying advertising intents. In this paper, we investigate the novel problem of advertiser-aware relevance modeling, which leverages the advertisers’ information to bridge the gap between the search intents and advertising purposes. Our motivation lies in incorporating the unsupervised bidding behaviors as the complementary graphs to learn desirable advertiser representations. We further propose a Bidding-Graph augmented Triple-based Relevance model BGTR with three towers to deeply fuse the bidding graphs and semantic textual data. Empirically, we evaluate the BGTR model over a large industry dataset, and the experimental results consistently demonstrate its superiority.

2020

pdf
Fine-grained Interest Matching for Neural News Recommendation
Heyuan Wang | Fangzhao Wu | Zheng Liu | Xing Xie
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Personalized news recommendation is a critical technology to improve users’ online news reading experience. The core of news recommendation is accurate matching between user’s interests and candidate news. The same user usually has diverse interests that are reflected in different news she has browsed. Meanwhile, important semantic features of news are implied in text segments of different granularities. Existing studies generally represent each user as a single vector and then match the candidate news vector, which may lose fine-grained information for recommendation. In this paper, we propose FIM, a Fine-grained Interest Matching method for neural news recommendation. Instead of aggregating user’s all historical browsed news into a unified vector, we hierarchically construct multi-level representations for each news via stacked dilated convolutions. Then we perform fine-grained matching between segment pairs of each browsed news and the candidate news at each semantic level. High-order salient signals are then identified by resembling the hierarchy of image recognition for final click prediction. Extensive experiments on a real-world dataset from MSN news validate the effectiveness of our model on news recommendation.

2019

pdf
Neural News Recommendation with Long- and Short-term User Representations
Mingxiao An | Fangzhao Wu | Chuhan Wu | Kun Zhang | Zheng Liu | Xing Xie
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Personalized news recommendation is important to help users find their interested news and improve reading experience. A key problem in news recommendation is learning accurate user representations to capture their interests. Users usually have both long-term preferences and short-term interests. However, existing news recommendation methods usually learn single representations of users, which may be insufficient. In this paper, we propose a neural news recommendation approach which can learn both long- and short-term user representations. The core of our approach is a news encoder and a user encoder. In the news encoder, we learn representations of news from their titles and topic categories, and use attention network to select important words. In the user encoder, we propose to learn long-term user representations from the embeddings of their IDs.In addition, we propose to learn short-term user representations from their recently browsed news via GRU network. Besides, we propose two methods to combine long-term and short-term user representations. The first one is using the long-term user representation to initialize the hidden state of the GRU network in short-term user representation. The second one is concatenating both long- and short-term user representations as a unified user vector. Extensive experiments on a real-world dataset show our approach can effectively improve the performance of neural news recommendation.