Ji-Rong Wen

Also published as: Ji-rong Wen


2023

pdf
Small Pre-trained Language Models Can be Fine-tuned as Large Models via Over-Parameterization
Ze-Feng Gao | Kun Zhou | Peiyu Liu | Wayne Xin Zhao | Ji-Rong Wen
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

By scaling the model size, large pre-trained language models (PLMs) have shown remarkable performance in various natural language processing tasks, mostly outperforming small PLMs by a large margin.However, due to the high computational cost, the huge number of parameters also restricts the applicability of large PLMs in real-world systems. In this paper, we focus on scaling up the parameters of PLMs only during fine-tuning, to benefit from the over-parameterization, while without increasing the inference latency. Given a relatively small PLM, we over-parameterize it by employing a matrix product operator, an efficient and almost lossless decomposition method to factorize its contained parameter matrices into a set of higher-dimensional tensors.Considering the efficiency, we further propose both static and dynamic strategies to select the most important parameter matrices for over-parameterization.Extensive experiments have demonstrated that our approach can significantly boost the fine-tuning performance of small PLMs and even help small PLMs outperform parameterized larger ones.Our code is publicly available at https://github.com/zfgao66/OPF.

pdf
TOME: A Two-stage Approach for Model-based Retrieval
Ruiyang Ren | Wayne Xin Zhao | Jing Liu | Hua Wu | Ji-Rong Wen | Haifeng Wang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recently, model-based retrieval has emerged as a new paradigm in text retrieval that discards the index in the traditional retrieval model and instead memorizes the candidate corpora using model parameters. This design employs a sequence-to-sequence paradigm to generate document identifiers, which enables the complete capture of the relevance between queries and documents and simplifies the classic index-retrieval-rerank pipeline. Despite its attractive qualities, there remain several major challenges in model-based retrieval, including the discrepancy between pre-training and fine-tuning, and the discrepancy between training and inference. To deal with the above challenges, we propose a novel two-stage model-based retrieval approach called TOME, which makes two major technical contributions, including the utilization of tokenized URLs as identifiers and the design of a two-stage generation architecture. We also propose a number of training strategies to deal with the training difficulty as the corpus size increases. Extensive experiments and analysis on MS MARCO and Natural Questions demonstrate the effectiveness of our proposed approach, and we investigate the scaling laws of TOME by examining various influencing factors.

pdf
Learning to Imagine: Visually-Augmented Natural Language Generation
Tianyi Tang | Yushuo Chen | Yifan Du | Junyi Li | Wayne Xin Zhao | Ji-Rong Wen
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

People often imagine relevant scenes to aid in the writing process. In this work, we aim to utilize visual information for composition in the same manner as humans. We propose a method, LIVE, that makes pre-trained language models (PLMs) Learn to Imagine for Visually-augmented natural language gEneration. First, we imagine the scene based on the text: we use a diffusion model to synthesize high-quality images conditioned on the input texts. Second, we use CLIP to determine whether the text can evoke the imagination in a posterior way. Finally, our imagination is dynamic, and we conduct synthesis for each sentence rather than generate only one image for an entire paragraph. Technically, we propose a novel plug-and-play fusion layer to obtain visually-augmented representations for each text. Our vision-text fusion layer is compatible with Transformer-based architecture. We have conducted extensive experiments on four generation tasks using BART and T5, and the automatic results and human evaluation demonstrate the effectiveness of our proposed method. We will release the code, model, and data at the link: https://github.com/RUCAIBox/LIVE.

pdf
Visually-augmented pretrained language models for NLP tasks without images
Hangyu Guo | Kun Zhou | Wayne Xin Zhao | Qinyu Zhang | Ji-Rong Wen
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Although pre-trained language models (PLMs) have shown impressive performance by text-only self-supervised training, they are found lack of visual semantics or commonsense. Existing solutions often rely on explicit images for visual knowledge augmentation (requiring time-consuming retrieval or generation), and they also conduct the augmentation for the whole input text, without considering whether it is actually needed in specific inputs or tasks. To address these issues, we propose a novel **V**isually-**A**ugmented fine-tuning approach that can be generally applied to various PLMs or NLP tasks, **W**ithout using any retrieved or generated **I**mages, namely **VAWI**. Experimental results show that our approach can consistently improve the performance of BERT, RoBERTa, BART, and T5 at different scales, and outperform several competitive baselines on ten tasks. Our codes and data are publicly available at https://github.com/RUCAIBox/VAWI.

pdf
The Web Can Be Your Oyster for Improving Language Models
Junyi Li | Tianyi Tang | Wayne Xin Zhao | Jingyuan Wang | Jian-Yun Nie | Ji-Rong Wen
Findings of the Association for Computational Linguistics: ACL 2023

Pretrained language models (PLMs) encode a large amount of world knowledge. However, as such knowledge is frozen at the time of model training, the models become static and limited by the training data at that time. In order to further improve the capacity of PLMs for knowledge-intensive tasks, we consider augmenting PLMs with the large-scale web using search engine. Unlike previous augmentation sources (e.g., Wikipedia data dump), the web provides broader, more comprehensive and constantly updated information. In this paper, we present a web-augmented PLM – UniWeb, which is trained over 16 knowledge-intensive tasks in a unified text-to-text format. Instead of simply using the retrieved contents from web, our approach has made two major improvements. Firstly, we propose an adaptive search engine assisted learning method that can self-evaluate the confidence level of PLM’s predictions, and adaptively determine when to refer to the web for more data, which can avoid useless or noisy augmentation from web. Secondly, we design a pretraining task, i.e., continual knowledge learning, based on salient spans prediction, to reduce the discrepancy between the encoded and retrieved knowledge. Experiments on a wide range of knowledge-intensive tasks show that our model significantly outperforms previous retrieval-augmented methods.

pdf
MVP: Multi-task Supervised Pre-training for Natural Language Generation
Tianyi Tang | Junyi Li | Wayne Xin Zhao | Ji-Rong Wen
Findings of the Association for Computational Linguistics: ACL 2023

Pre-trained language models (PLMs) have achieved remarkable success in natural language generation (NLG) tasks. Up to now, most NLG-oriented PLMs are pre-trained in an unsupervised manner using the large-scale general corpus. In the meanwhile, an increasing number of models pre-trained with labeled data (i.e. “supervised pre-training”) showcase superior performance compared to unsupervised pre-trained models. Motivated by the success of supervised pre-training, we propose Multi-task superVised Pre-training (MVP) for natural language generation. We collect a large-scale natural language generation corpus, MVPCorpus, from 77 datasets over 11 diverse NLG tasks. Then we unify these examples into a general text-to-text format to pre-train the text generation model MVP in a supervised manner. For each task, we further pre-train specific soft prompts to stimulate the model’s capacity to perform a specific task. Our MVP model can be seen as a practice that utilizes recent instruction tuning on relatively small PLMs. Extensive experiments have demonstrated the effectiveness and generality of our MVP model in a number of NLG tasks, which achieves state-of-the-art performance on 13 out of 17 datasets, outperforming BART by 9.3% and Flan-T5 by 5.8%.

pdf
Zero-shot Visual Question Answering with Language Model Feedback
Yifan Du | Junyi Li | Tianyi Tang | Wayne Xin Zhao | Ji-Rong Wen
Findings of the Association for Computational Linguistics: ACL 2023

In this paper, we propose a novel language model guided captioning approach, LAMOC, for knowledge-based visual question answering (VQA). Our approach employs the generated captions by a captioning model as the context of an answer prediction model, which is a Pre-Trained Language model (PLM). As the major contribution, we leverage the guidance and feedback of the prediction model to improve the capability of the captioning model. In this way, the captioning model can become aware of the task goal and information need from the PLM. To develop our approach, we design two specific training stages, where the first stage adapts the captioning model to the prediction model (selecting more suitable caption propositions for training) and the second stage tunes the captioning model according to the task goal (learning from feedback of the PLM). Extensive experiments demonstrate the effectiveness of the proposed approach on the knowledge-based VQA task. Specifically, on the challenging A-OKVQA dataset, LAMOC outperforms several competitive zero-shot methods and even achieves comparable results to a fine-tuned VLP model. Our code is publicly available at https://github.com/RUCAIBox/LAMOC.

pdf
SSP: Self-Supervised Post-training for Conversational Search
Quan Tu | Shen Gao | Xiaolong Wu | Zhao Cao | Ji-Rong Wen | Rui Yan
Findings of the Association for Computational Linguistics: ACL 2023

Conversational search has been regarded as the next-generation search paradigm.Constrained by data scarcity, most existing methods distill the well-trained ad-hoc retriever to the conversational retriever.However, these methods, which usually initialize parameters by query reformulation to discover contextualized dependency, have trouble in understanding the dialogue structure information and struggle with contextual semantic vanishing.In this paper, we propose {pasted macro ‘FULLMODEL’} ({pasted macro ‘MODEL’}) which is a new post-training paradigm with three self-supervised tasks to efficiently initialize the conversational search model to enhance the dialogue structure and contextual semantic understanding.Furthermore, the {pasted macro ‘MODEL’} can be plugged into most of the existing conversational models to boost their performance.To verify the effectiveness of our proposed method, we apply the conversational encoder post-trained by {pasted macro ‘MODEL’} on the conversational search task using two benchmark datasets: CAsT-19 and CAsT-20.Extensive experiments that our {pasted macro ‘MODEL’} can boost the performance of several existing conversational search methods. Our source code is available at https://github.com/morecry/SSP.

2022

pdf
Semantic Sentence Matching via Interacting Syntax Graphs
Chen Xu | Jun Xu | Zhenhua Dong | Ji-Rong Wen
Proceedings of the 29th International Conference on Computational Linguistics

Studies have shown that the sentence’s syntactic structures are important for semantic sentence matching. A typical approach is encoding each sentence’s syntactic structure into an embedding vector, which can be combined with other features to predict the final matching scores. Though successes have been observed, embedding the whole syntactic structures as one vector inevitably overlooks the fine-grained syntax matching patterns, e.g. the alignment of specific term dependencies relations in the two inputted sentences. In this paper, we formalize the task of semantic sentence matching as a problem of graph matching in which each sentence is represented as a directed graph according to its syntactic structures. The syntax matching patterns (i.e. similar syntactic structures) between two sentences, therefore, can be extracted as the sub-graph structure alignments. The proposed method, referred to as Interacted Syntax Graphs (ISG), represents two sentences’ syntactic alignments as well as their semantic matching signals into one association graph. After that, the neural quadratic assignment programming (QAP) is adapted to extract syntactic matching patterns from the association graph. In this way, the syntactic structures fully interact in a fine granularity during the matching process. Experimental results on three public datasets demonstrated that ISG can outperform the state-of-the-art baselines effectively and efficiently. The empirical analysis also showed that ISG can match sentences in an interpretable way.

pdf
Optimal Partial Transport Based Sentence Selection for Long-form Document Matching
Weijie Yu | Liang Pang | Jun Xu | Bing Su | Zhenhua Dong | Ji-Rong Wen
Proceedings of the 29th International Conference on Computational Linguistics

One typical approach to long-form document matching is first conducting alignment between cross-document sentence pairs, and then aggregating all of the sentence-level matching signals. However, this approach could be problematic because the alignment between documents is partial — despite two documents as a whole are well-matched, most of the sentences could still be dissimilar. Those dissimilar sentences lead to spurious sentence-level matching signals which may overwhelm the real ones, increasing the difficulties of learning the matching function. Therefore, accurately selecting the key sentences for document matching is becoming a challenging issue. To address the issue, we propose a novel matching approach that equips existing document matching models with an Optimal Partial Transport (OPT) based component, namely OPT-Match, which selects the sentences that play a major role in matching. Enjoying the partial transport properties of OPT, the selected key sentences can not only effectively enhance the matching accuracy, but also be explained as the rationales for the matching results. Extensive experiments on four publicly available datasets demonstrated that existing methods equipped with OPT-Match consistently outperformed the corresponding underlying methods. Evaluations also showed that the key sentences selected by OPT-Match were consistent with human-provided rationales.

pdf
Parameter-Efficient Mixture-of-Experts Architecture for Pre-trained Language Models
Ze-Feng Gao | Peiyu Liu | Wayne Xin Zhao | Zhong-Yi Lu | Ji-Rong Wen
Proceedings of the 29th International Conference on Computational Linguistics

Recently, Mixture-of-Experts (short as MoE) architecture has achieved remarkable success in increasing the model capacity of large-scale language models. However, MoE requires incorporating significantly more parameters than the base model being extended. In this paper, we propose building a parameter-efficient MoE architecture by sharing information across experts. We adopt matrix product operator (MPO, a tensor decomposition from quantum many-body physics) to reconstruct the parameter matrix in the expert layer and increase model capacity for pre-trained language models by sharing parameters of the central tensor (containing the core information) among different experts while enabling the specificity through the auxiliary tensors (complementing the central tensor) of different experts. To address the unbalanced optimization issue, we further design the gradient mask strategy for the MPO-based MoE architecture. Extensive experiments based on T5 and GPT-2 show improved performance and efficiency of the pre-trained language model (27.2x reduction in total parameters for the superior model performance, compared with the Switch Transformers). Our code is publicly available at https://github.com/RUCAIBox/MPO/MPOE.

pdf
Context-Tuning: Learning Contextualized Prompts for Natural Language Generation
Tianyi Tang | Junyi Li | Wayne Xin Zhao | Ji-Rong Wen
Proceedings of the 29th International Conference on Computational Linguistics

Recently, pretrained language models (PLMs) have had exceptional success in language generation. To leverage the rich knowledge encoded by PLMs, a simple yet powerful paradigm is to use prompts in the form of either discrete tokens or continuous embeddings. In existing studies, these prompting methods are typically independent of the inputs, lacking sufficient consideration of input semantics. To address this issue, we propose a novel continuous prompting approach, called context-tuning, to fine-tuning PLMs for natural language generation. Firstly, the prompts are derived based on the input text to elicit useful knowledge from PLMs for generation. We refer to such prompts as contextualized prompts. Secondly, we use continuous inverse prompting to improve the process of natural language generation by modeling an inverse generation process from output to input, making the generated text more relevant to the inputs. Furthermore, we utilize a lightweight context-tuning method that fine-tunes only 0.12% of the parameters while maintaining good performance. Our code is publicly available at https://github.com/RUCAIBox/Context-Tuning.

pdf
Learning to Transfer Prompts for Text Generation
Junyi Li | Tianyi Tang | Jian-Yun Nie | Ji-Rong Wen | Xin Zhao
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Pretrained language models (PLMs) have made remarkable progress in text generation tasks via fine-tuning. While, it is challenging to fine-tune PLMs in a data-scarce situation. Therefore, it is non-trivial to develop a general and lightweight model that can adapt to various text generation tasks based on PLMs. To fulfill this purpose, the recent prompt-based learning offers a potential solution. In this paper, we improve this technique and propose a novel prompt-based method (PTG) for text generation in a transferable setting. First, PTG learns a set of source prompts for various source generation tasks and then transfers these prompts as target prompts to perform target generation tasks. To consider both task- and instance-level information, we design an adaptive attention mechanism to derive the target prompts. For each data instance, PTG learns a specific target prompt by attending to highly relevant source prompts. In extensive experiments, PTG yields competitive or better results than fine-tuning methods. We release our source prompts as an open resource, where users can add or reuse them to improve new text generation tasks for future research. Code and data can be available at https://github.com/RUCAIBox/Transfer-Prompts-for-Text-Generation.

pdf
ElitePLM: An Empirical Study on General Language Ability Evaluation of Pretrained Language Models
Junyi Li | Tianyi Tang | Zheng Gong | Lixin Yang | Zhuohao Yu | Zhipeng Chen | Jingyuan Wang | Xin Zhao | Ji-Rong Wen
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Nowadays, pretrained language models (PLMs) have dominated the majority of NLP tasks. While, little research has been conducted on systematically evaluating the language abilities of PLMs. In this paper, we present a large-scale empirical study on general language ability evaluation of PLMs (ElitePLM). In our study, we design four evaluation dimensions, memory, comprehension, reasoning, and composition, to measure ten widely-used PLMs within five categories. Our empirical results demonstrate that: (1) PLMs with varying training objectives and strategies are good at different ability tests; (2) fine-tuning PLMs in downstream tasks is usually sensitive to the data size and distribution; (3) PLMs have excellent transferability between similar tasks. Moreover, the prediction results of PLMs in our experiments are released as an open resource for more deep and detailed analysis on the language abilities of PLMs. This paper can guide the future work to select, apply, and design PLMs for specific tasks. We have made all the details of experiments publicly available at https://github.com/RUCAIBox/ElitePLM.

pdf
Less is More: Learning to Refine Dialogue History for Personalized Dialogue Generation
Hanxun Zhong | Zhicheng Dou | Yutao Zhu | Hongjin Qian | Ji-Rong Wen
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Personalized dialogue systems explore the problem of generating responses that are consistent with the user’s personality, which has raised much attention in recent years. Existing personalized dialogue systems have tried to extract user profiles from dialogue history to guide personalized response generation. Since the dialogue history is usually long and noisy, most existing methods truncate the dialogue history to model the user’s personality. Such methods can generate some personalized responses, but a large part of dialogue history is wasted, leading to sub-optimal performance of personalized response generation. In this work, we propose to refine the user dialogue history on a large scale, based on which we can handle more dialogue history and obtain more abundant and accurate persona information. Specifically, we design an MSP model which consists of three personal information refiners and a personalized response generator. With these multi-level refiners, we can sparsely extract the most valuable information (tokens) from the dialogue history and leverage other similar users’ data to enhance personalization. Experimental results on two real-world datasets demonstrate the superiority of our model in generating more informative and personalized responses.

pdf
Finding the Dominant Winning Ticket in Pre-Trained Language Models
Zhuocheng Gong | Di He | Yelong Shen | Tie-Yan Liu | Weizhu Chen | Dongyan Zhao | Ji-Rong Wen | Rui Yan
Findings of the Association for Computational Linguistics: ACL 2022

The Lottery Ticket Hypothesis suggests that for any over-parameterized model, a small subnetwork exists to achieve competitive performance compared to the backbone architecture. In this paper, we study whether there is a winning lottery ticket for pre-trained language models, which allow the practitioners to fine-tune the parameters in the ticket but achieve good downstream performance. To achieve this, we regularize the fine-tuning process with L1 distance and explore the subnetwork structure (what we refer to as the “dominant winning ticket”). Empirically, we show that (a) the dominant winning ticket can achieve performance that is comparable with that of the full-parameter model, (b) the dominant winning ticket is transferable across different tasks, (c) and the dominant winning ticket has a natural structure within each parameter matrix. Strikingly, we find that a dominant winning ticket that takes up 0.05% of the parameters can already achieve satisfactory performance, indicating that the PLM is significantly reducible during fine-tuning.

pdf
Great~Truths~are ~Always ~Simple: A Rather Simple Knowledge Encoder for Enhancing the Commonsense Reasoning Capacity of Pre-Trained Models
Jinhao Jiang | Kun Zhou | Ji-Rong Wen | Xin Zhao
Findings of the Association for Computational Linguistics: NAACL 2022

Commonsense reasoning in natural language is a desired ability of artificial intelligent systems. For solving complex commonsense reasoning tasks, a typical solution is to enhance pre-trained language models (PTMs) with a knowledge-aware graph neural network (GNN) encoder that models a commonsense knowledge graph (CSKG).Despite the effectiveness, these approaches are built on heavy architectures, and can’t clearly explain how external knowledge resources improve the reasoning capacity of PTMs. Considering this issue, we conduct a deep empirical analysis, and find that it is indeed relation features from CSKGs (but not node features) that mainly contribute to the performance improvement of PTMs. Based on this finding, we design a simple MLP-based knowledge encoder that utilizes statistical relation paths as features. Extensive experiments conducted on five benchmarks demonstrate the effectiveness of our approach, which also largely reduces the parameters for encoding CSKGs.Our codes and data are publicly available at https://github.com/RUCAIBox/SAFE.

pdf
A Multi-Modal Knowledge Graph for Classical Chinese Poetry
Yuqing Li | Yuxin Zhang | Bin Wu | Ji-Rong Wen | Ruihua Song | Ting Bai
Findings of the Association for Computational Linguistics: EMNLP 2022

Classical Chinese poetry has a long history and is a precious cultural heritage of humankind. Displaying the classical Chinese poetry in a visual way, helps to cross cultural barriers in different countries, making it enjoyable for all the people. In this paper, we construct a multi-modal knowledge graph for classical Chinese poetry (PKG), in which the visual information of words in the poetry are incorporated. Then a multi-modal pre-training language model, PKG-Bert, is proposed to obtain the poetry representation with visual information, which bridges the semantic gap between different modalities. PKG-Bert achieves the state-of-the-art performance on the poetry-image retrieval task, showing the effectiveness of incorporating the multi-modal knowledge. The large-scale multi-modal knowledge graph of classical Chinese poetry will be released to promote the researches in classical Chinese culture area.

pdf
MISC: A Mixed Strategy-Aware Model integrating COMET for Emotional Support Conversation
Quan Tu | Yanran Li | Jianwei Cui | Bin Wang | Ji-Rong Wen | Rui Yan
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Applying existing methods to emotional support conversation—which provides valuable assistance to people who are in need—has two major limitations: (a) they generally employ a conversation-level emotion label, which is too coarse-grained to capture user’s instant mental state; (b) most of them focus on expressing empathy in the response(s) rather than gradually reducing user’s distress. To address the problems, we propose a novel model MISC, which firstly infers the user’s fine-grained emotional status, and then responds skillfully using a mixture of strategy. Experimental results on the benchmark dataset demonstrate the effectiveness of our method and reveal the benefits of fine-grained emotion understanding as well as mixed-up strategy modeling.

pdf
There Are a Thousand Hamlets in a Thousand People’s Eyes: Enhancing Knowledge-grounded Dialogue with Personal Memory
Tingchen Fu | Xueliang Zhao | Chongyang Tao | Ji-Rong Wen | Rui Yan
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Knowledge-grounded conversation (KGC) shows great potential in building an engaging and knowledgeable chatbot, and knowledge selection is a key ingredient in it. However, previous methods for knowledge selection only concentrate on the relevance between knowledge and dialogue context, ignoring the fact that age, hobby, education and life experience of an interlocutor have a major effect on his or her personal preference over external knowledge. Without taking the personalization issue into account, it is difficult for existing dialogue systems to select the proper knowledge and generate persona-consistent responses.In this work, we introduce personal memory into knowledge selection in KGC to address the personalization issue. We propose a variational method to model the underlying relationship between one’s personal memory and his or her selection of knowledge, and devise a learning scheme in which the forward mapping from personal memory to knowledge and its inverse mapping is included in a closed loop so that they could teach each other. Experiment results show that our methods outperform existing KGC methods significantly on both automatic evaluation and human evaluation.

pdf
Continual Pre-training of Language Models for Math Problem Understanding with Syntax-Aware Memory Network
Zheng Gong | Kun Zhou | Xin Zhao | Jing Sha | Shijin Wang | Ji-Rong Wen
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In this paper, we study how to continually pre-train language models for improving the understanding of math problems. Specifically, we focus on solving a fundamental challenge in modeling math problems, how to fuse the semantics of textual description and formulas, which are highly different in essence. To address this issue, we propose a new approach called COMUS to continually pre-train language models for math problem understanding with syntax-aware memory network. In this approach, we first construct the math syntax graph to model the structural semantic information, by combining the parsing trees of the text and formulas, and then design the syntax-aware memory networks to deeply fuse the features from the graph and text. With the help of syntax relations, we can model the interaction between the token from the text and its semantic-related nodes within the formulas, which is helpful to capture fine-grained semantic correlations between texts and formulas. Besides, we devise three continual pre-training tasks to further align and fuse the representations of the text and math syntax graph. Experimental results on four tasks in the math domain demonstrate the effectiveness of our approach. Our code and data are publicly available at the link: bluehttps://github.com/RUCAIBox/COMUS.

pdf
Debiased Contrastive Learning of Unsupervised Sentence Representations
Kun Zhou | Beichen Zhang | Xin Zhao | Ji-Rong Wen
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recently, contrastive learning has been shown to be effective in improving pre-trained language models (PLM) to derive high-quality sentence representations. It aims to pull close positive examples to enhance the alignment while push apart irrelevant negatives for the uniformity of the whole representation space.However, previous works mostly adopt in-batch negatives or sample from training data at random. Such a way may cause the sampling bias that improper negatives (false negatives and anisotropy representations) are used to learn sentence representations, which will hurt the uniformity of the representation space.To address it, we present a new framework DCLR (Debiased Contrastive Learning of unsupervised sentence Representations) to alleviate the influence of these improper negatives.In DCLR, we design an instance weighting method to punish false negatives and generate noise-based negatives to guarantee the uniformity of the representation space.Experiments on seven semantic textual similarity tasks show that our approach is more effective than competitive baselines. Our code and data are publicly available at the link: bluehttps://github.com/RUCAIBox/DCLR.

pdf
ELMER: A Non-Autoregressive Pre-trained Language Model for Efficient and Effective Text Generation
Junyi Li | Tianyi Tang | Wayne Xin Zhao | Jian-Yun Nie | Ji-Rong Wen
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

We study the text generation task under the approach of pre-trained language models (PLMs). Typically, an auto-regressive (AR) method is adopted for generating texts in a token-by-token manner. Despite many advantages of AR generation, it usually suffers from inefficient inference. Therefore, non-autoregressive (NAR) models are proposed to generate all target tokens simultaneously. However, NAR models usually generate texts of lower quality due to the absence of token dependency in the output text. In this paper, we propose ELMER: an efficient and effective PLM for NAR text generation to explicitly model the token dependency during NAR generation. By leveraging the early exit technique, ELMER enables the token generations at different layers, according to their prediction confidence (a more confident token will exit at a lower layer). Besides, we propose a novel pre-training objective, Layer Permutation Language Modeling, to pre-train ELMER by permuting the exit layer for each token in sequences. Experiments on three text generation tasks show that ELMER significantly outperforms NAR models and further narrows the performance gap with AR PLMs (ELMER (29.92) vs BART (30.61) ROUGE-L in XSUM) while achieving over 10 times inference speedup.

pdf
TextBox 2.0: A Text Generation Library with Pre-trained Language Models
Tianyi Tang | Junyi Li | Zhipeng Chen | Yiwen Hu | Zhuohao Yu | Wenxun Dai | Wayne Xin Zhao | Jian-yun Nie | Ji-rong Wen
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

To facilitate research on text generation, this paper presents a comprehensive and unified library, TextBox 2.0, focusing on the use of pre-trained language models (PLMs). To be comprehensive, our library covers 13 common text generation tasks and their corresponding 83 datasets and further incorporates 45 PLMs covering general, translation, Chinese, dialogue, controllable, distilled, prompting, and lightweight PLMs. We also implement 4 efficient training strategies and provide 4 generation objectives for pre-training new PLMs from scratch. To be unified, we design the interfaces to support the entire research pipeline (from data loading to training and evaluation), ensuring that each step can be fulfilled in a unified way. Despite the rich functionality, it is easy to use our library, either through the friendly Python API or command line. To validate the effectiveness of our library, we conduct extensive experiments and exemplify four types of research scenarios. The project is released at the link: https://github.com/RUCAIBox/TextBox#2.0.

pdf
SimANS: Simple Ambiguous Negatives Sampling for Dense Text Retrieval
Kun Zhou | Yeyun Gong | Xiao Liu | Wayne Xin Zhao | Yelong Shen | Anlei Dong | Jingwen Lu | Rangan Majumder | Ji-rong Wen | Nan Duan
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track

Sampling proper negatives from a large document pool is vital to effectively train a dense retrieval model.However, existing negative sampling strategies suffer from the uninformative or false negative problem.In this work, we empirically show that according to the measured relevance scores, the negatives ranked around the positives are generally more informative and less likely to be false negatives.Intuitively, these negatives are not too hard (may be false negatives) or too easy (uninformative). They are the ambiguous negatives and need more attention during training.Thus, we propose a simple ambiguous negatives sampling method, SimANS, which incorporates a new sampling probability distribution to sample more ambiguous negatives.Extensive experiments on four public and one industry datasets show the effectiveness of our approach.We made the code and models publicly available in https://github.com/microsoft/SimXNS.

2021

pdf
Few-shot Knowledge Graph-to-Text Generation with Pretrained Language Models
Junyi Li | Tianyi Tang | Wayne Xin Zhao | Zhicheng Wei | Nicholas Jing Yuan | Ji-Rong Wen
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf
PAIR: Leveraging Passage-Centric Similarity Relation for Improving Dense Passage Retrieval
Ruiyang Ren | Shangwen Lv | Yingqi Qu | Jing Liu | Wayne Xin Zhao | QiaoQiao She | Hua Wu | Haifeng Wang | Ji-Rong Wen
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf
RocketQAv2: A Joint Training Method for Dense Passage Retrieval and Passage Re-ranking
Ruiyang Ren | Yingqi Qu | Jing Liu | Wayne Xin Zhao | QiaoQiao She | Hua Wu | Haifeng Wang | Ji-Rong Wen
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

In various natural language processing tasks, passage retrieval and passage re-ranking are two key procedures in finding and ranking relevant information. Since both the two procedures contribute to the final performance, it is important to jointly optimize them in order to achieve mutual improvement. In this paper, we propose a novel joint training approach for dense passage retrieval and passage reranking. A major contribution is that we introduce the dynamic listwise distillation, where we design a unified listwise training approach for both the retriever and the re-ranker. During the dynamic distillation, the retriever and the re-ranker can be adaptively improved according to each other’s relevance information. We also propose a hybrid data augmentation strategy to construct diverse training instances for listwise training approach. Extensive experiments show the effectiveness of our approach on both MSMARCO and Natural Questions datasets. Our code is available at https://github.com/PaddlePaddle/RocketQA.

pdf
Virtual Data Augmentation: A Robust and General Framework for Fine-tuning Pre-trained Models
Kun Zhou | Wayne Xin Zhao | Sirui Wang | Fuzheng Zhang | Wei Wu | Ji-Rong Wen
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Recent works have shown that powerful pre-trained language models (PLM) can be fooled by small perturbations or intentional attacks. To solve this issue, various data augmentation techniques are proposed to improve the robustness of PLMs. However, it is still challenging to augment semantically relevant examples with sufficient diversity. In this work, we present Virtual Data Augmentation (VDA), a general framework for robustly fine-tuning PLMs. Based on the original token embeddings, we construct a multinomial mixture for augmenting virtual data embeddings, where a masked language model guarantees the semantic relevance and the Gaussian noise provides the augmentation diversity. Furthermore, a regularized training strategy is proposed to balance the two aspects. Extensive experiments on six datasets show that our approach is able to improve the robustness of PLMs and alleviate the performance degradation under adversarial attacks. Our codes and data are publicly available at bluehttps://github.com/RUCAIBox/VDA.

pdf
A Joint Model for Dropped Pronoun Recovery and Conversational Discourse Parsing in Chinese Conversational Speech
Jingxuan Yang | Kerui Xu | Jun Xu | Si Li | Sheng Gao | Jun Guo | Nianwen Xue | Ji-Rong Wen
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In this paper, we present a neural model for joint dropped pronoun recovery (DPR) and conversational discourse parsing (CDP) in Chinese conversational speech. We show that DPR and CDP are closely related, and a joint model benefits both tasks. We refer to our model as DiscProReco, and it first encodes the tokens in each utterance in a conversation with a directed Graph Convolutional Network (GCN). The token states for an utterance are then aggregated to produce a single state for each utterance. The utterance states are then fed into a biaffine classifier to construct a conversational discourse graph. A second (multi-relational) GCN is then applied to the utterance states to produce a discourse relation-augmented representation for the utterances, which are then fused together with token states in each utterance as input to a dropped pronoun recovery layer. The joint model is trained and evaluated on a new Structure Parsing-enhanced Dropped Pronoun Recovery (SPDPR) data set that we annotated with both two types of information. Experimental results on the SPDPR dataset and other benchmarks show that DiscProReco significantly outperforms the state-of-the-art baselines of both tasks.

pdf
A Pre-training Strategy for Zero-Resource Response Selection in Knowledge-Grounded Conversations
Chongyang Tao | Changyu Chen | Jiazhan Feng | Ji-Rong Wen | Rui Yan
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Recently, many studies are emerging towards building a retrieval-based dialogue system that is able to effectively leverage background knowledge (e.g., documents) when conversing with humans. However, it is non-trivial to collect large-scale dialogues that are naturally grounded on the background documents, which hinders the effective and adequate training of knowledge selection and response matching. To overcome the challenge, we consider decomposing the training of the knowledge-grounded response selection into three tasks including: 1) query-passage matching task; 2) query-dialogue history matching task; 3) multi-turn response matching task, and joint learning all these tasks in a unified pre-trained language model. The former two tasks could help the model in knowledge selection and comprehension, while the last task is designed for matching the proper response with the given query and background knowledge (dialogue history). By this means, the model can be learned to select relevant knowledge and distinguish proper response, with the help of ad-hoc retrieval corpora and a large number of ungrounded multi-turn dialogues. Experimental results on two benchmarks of knowledge-grounded response selection indicate that our model can achieve comparable performance with several existing methods that rely on crowd-sourced data for training.

pdf
Enabling Lightweight Fine-tuning for Pre-trained Language Model Compression based on Matrix Product Operators
Peiyu Liu | Ze-Feng Gao | Wayne Xin Zhao | Zhi-Yuan Xie | Zhong-Yi Lu | Ji-Rong Wen
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

This paper presents a novel pre-trained language models (PLM) compression approach based on the matrix product operator (short as MPO) from quantum many-body physics. It can decompose an original matrix into central tensors (containing the core information) and auxiliary tensors (with only a small proportion of parameters). With the decomposed MPO structure, we propose a novel fine-tuning strategy by only updating the parameters from the auxiliary tensors, and design an optimization algorithm for MPO-based approximation over stacked network architectures. Our approach can be applied to the original or the compressed PLMs in a general way, which derives a lighter network and significantly reduces the parameters to be fine-tuned. Extensive experiments have demonstrated the effectiveness of the proposed approach in model compression, especially the reduction in fine-tuning parameters (91% reduction on average). The code to reproduce the results of this paper can be found at https://github.com/RUCAIBox/MPOP.

pdf
TextBox: A Unified, Modularized, and Extensible Framework for Text Generation
Junyi Li | Tianyi Tang | Gaole He | Jinhao Jiang | Xiaoxuan Hu | Puzhao Xie | Zhipeng Chen | Zhuohao Yu | Wayne Xin Zhao | Ji-Rong Wen
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations

In this paper, we release an open-source library, called TextBox, to provide a unified, modularized, and extensible text generation framework. TextBox aims to support a broad set of text generation tasks and models. In our library, we implement 21 text generation models on 9 benchmark datasets, covering the categories of VAE, GAN, and pretrained language models. Meanwhile, our library maintains sufficient modularity and extensibility by properly decomposing the model architecture, inference, and learning process into highly reusable modules, which allows users to easily incorporate new models into our framework. The above features make TextBox especially suitable for researchers and practitioners to quickly reproduce baseline models and develop new models. TextBox is implemented based on PyTorch, and released under Apache License 2.0 at the link https://github.com/RUCAIBox/TextBox.

pdf
CRSLab: An Open-Source Toolkit for Building Conversational Recommender System
Kun Zhou | Xiaolei Wang | Yuanhang Zhou | Chenzhan Shang | Yuan Cheng | Wayne Xin Zhao | Yaliang Li | Ji-Rong Wen
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations

In recent years, conversational recommender systems (CRSs) have drawn a wide attention in the research community, which focus on providing high-quality recommendations to users via natural language conversations. However, due to diverse scenarios and data formats, existing studies on CRSs lack unified and standardized implementation or comparison. To tackle this challenge, we release an open-source toolkit CRSLab, which provides a unified and extensible framework with highly-decoupled modules to develop CRSs. Based on this framework, we collect 6 commonly used human-annotated CRS datasets and implement 19 models that include advanced techniques such as graph neural networks and pre-training models. Besides, our toolkit provides a series of automatic evaluation protocols and a human-machine interaction interface to evaluate and compare different CRS methods. The project and documents are released at https://github.com/RUCAIBox/CRSLab.

2020

pdf
Towards Topic-Guided Conversational Recommender System
Kun Zhou | Yuanhang Zhou | Wayne Xin Zhao | Xiaoke Wang | Ji-Rong Wen
Proceedings of the 28th International Conference on Computational Linguistics

Conversational recommender systems (CRS) aim to recommend high-quality items to users through interactive conversations. To develop an effective CRS, the support of high-quality datasets is essential. Existing CRS datasets mainly focus on immediate requests from users, while lack proactive guidance to the recommendation scenario. In this paper, we contribute a new CRS dataset named TG-ReDial (Recommendation through Topic-Guided Dialog). Our dataset has two major features. First, it incorporates topic threads to enforce natural semantic transitions towards the recommendation scenario. Second, it is created in a semi-automatic way, hence human annotation is more reasonable and controllable. Based on TG-ReDial, we present the task of topic-guided conversational recommendation, and propose an effective approach to this task. Extensive experiments have demonstrated the effectiveness of our approach on three sub-tasks, namely topic prediction, item recommendation and response generation. TG-ReDial is available at bluehttps://github.com/RUCAIBox/TG-ReDial.

pdf
Wasserstein Distance Regularized Sequence Representation for Text Matching in Asymmetrical Domains
Weijie Yu | Chen Xu | Jun Xu | Liang Pang | Xiaopeng Gao | Xiaozhao Wang | Ji-Rong Wen
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

One approach to matching texts from asymmetrical domains is projecting the input sequences into a common semantic space as feature vectors upon which the matching function can be readily defined and learned. In real-world matching practices, it is often observed that with the training goes on, the feature vectors projected from different domains tend to be indistinguishable. The phenomenon, however, is often overlooked in existing matching models. As a result, the feature vectors are constructed without any regularization, which inevitably increases the difficulty of learning the downstream matching functions. In this paper, we propose a novel match method tailored for text matching in asymmetrical domains, called WD-Match. In WD-Match, a Wasserstein distance-based regularizer is defined to regularize the features vectors projected from different domains. As a result, the method enforces the feature projection function to generate vectors such that those correspond to different domains cannot be easily discriminated. The training process of WD-Match amounts to a game that minimizes the matching loss regularized by the Wasserstein distance. WD-Match can be used to improve different text matching methods, by using the method as its underlying matching model. Four popular text matching methods have been exploited in the paper. Experimental results based on four publicly available benchmarks showed that WD-Match consistently outperformed the underlying methods and the baselines.

pdf
Transformer-GCRF: Recovering Chinese Dropped Pronouns with General Conditional Random Fields
Jingxuan Yang | Kerui Xu | Jun Xu | Si Li | Sheng Gao | Jun Guo | Ji-Rong Wen | Nianwen Xue
Findings of the Association for Computational Linguistics: EMNLP 2020

Pronouns are often dropped in Chinese conversations and recovering the dropped pronouns is important for NLP applications such as Machine Translation. Existing approaches usually formulate this as a sequence labeling task of predicting whether there is a dropped pronoun before each token and its type. Each utterance is considered to be a sequence and labeled independently. Although these approaches have shown promise, labeling each utterance independently ignores the dependencies between pronouns in neighboring utterances. Modeling these dependencies is critical to improving the performance of dropped pronoun recovery. In this paper, we present a novel framework that combines the strength of Transformer network with General Conditional Random Fields (GCRF) to model the dependencies between pronouns in neighboring utterances. Results on three Chinese conversation datasets show that the Transformer-GCRF model outperforms the state-of-the-art dropped pronoun recovery models. Exploratory analysis also demonstrates that the GCRF did help to capture the dependencies between pronouns in neighboring utterances, thus contributes to the performance improvements.

2019

pdf
Domain Adaptation for Person-Job Fit with Transferable Deep Global Match Network
Shuqing Bian | Wayne Xin Zhao | Yang Song | Tao Zhang | Ji-Rong Wen
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Person-job fit has been an important task which aims to automatically match job positions with suitable candidates. Previous methods mainly focus on solving the match task in single-domain setting, which may not work well when labeled data is limited. We study the domain adaptation problem for person-job fit. We first propose a deep global match network for capturing the global semantic interactions between two sentences from a job posting and a candidate resume respectively. Furthermore, we extend the match network and implement domain adaptation in three levels, sentence-level representation, sentence-level match, and global match. Extensive experiment results on a large real-world dataset consisting of six domains have demonstrated the effectiveness of the proposed model, especially when there is not sufficient labeled data.

pdf
A Neural Citation Count Prediction Model based on Peer Review Text
Siqing Li | Wayne Xin Zhao | Eddy Jing Yin | Ji-Rong Wen
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Citation count prediction (CCP) has been an important research task for automatically estimating the future impact of a scholarly paper. Previous studies mainly focus on extracting or mining useful features from the paper itself or the associated authors. An important kind of data signals, peer review text, has not been utilized for the CCP task. In this paper, we take the initiative to utilize peer review data for the CCP task with a neural prediction model. Our focus is to learn a comprehensive semantic representation for peer review text for improving the prediction performance. To achieve this goal, we incorporate the abstract-review match mechanism and the cross-review match mechanism to learn deep features from peer review text. We also consider integrating hand-crafted features via a wide component. The deep and wide components jointly make the prediction. Extensive experiments have demonstrated the usefulness of the peer review data and the effectiveness of the proposed model. Our dataset has been released online.

pdf
Generating Long and Informative Reviews with Aspect-Aware Coarse-to-Fine Decoding
Junyi Li | Wayne Xin Zhao | Ji-Rong Wen | Yang Song
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Generating long and informative review text is a challenging natural language generation task. Previous work focuses on word-level generation, neglecting the importance of topical and syntactic characteristics from natural languages. In this paper, we propose a novel review generation model by characterizing an elaborately designed aspect-aware coarse-to-fine generation process. First, we model the aspect transitions to capture the overall content flow. Then, to generate a sentence, an aspect-aware sketch will be predicted using an aspect-aware decoder. Finally, another decoder fills in the semantic slots by generating corresponding words. Our approach is able to jointly utilize aspect semantics, syntactic sketch, and context information. Extensive experiments results have demonstrated the effectiveness of the proposed model.

2013

pdf
Improving Web Search Ranking by Incorporating Structured Annotation of Queries
Xiao Ding | Zhicheng Dou | Bing Qin | Ting Liu | Ji-Rong Wen
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

2010

pdf
Corpus-based Semantic Class Mining: Distributional vs. Pattern-Based Approaches
Shuming Shi | Huibin Zhang | Xiaojie Yuan | Ji-Rong Wen
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

2009

pdf
Employing Topic Models for Pattern-based Semantic Class Discovery
Huibin Zhang | Mingjie Zhu | Shuming Shi | Ji-Rong Wen
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
Anchor Text Extraction for Academic Search
Shuming Shi | Fei Xing | Mingjie Zhu | Zaiqing Nie | Ji-Rong Wen
Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries (NLPIR4DL)