Yang Liu

Microsoft Cognitive Services Research

Other people with similar names: Yang Janet Liu (Georgetown University; 刘洋), Yang Liu (May refer to several people), Yang Liu (3M Health Information Systems), Yang Liu (University of Helsinki), Yang Liu (National University of Defense Technology), Yang Liu (Edinburgh), Yang Liu (The Chinese University of Hong Kong (Shenzhen)), Yang Liu (刘扬; Ph.D Purdue; ICSI, Dallas, Facebook, Liulishuo, Amazon), Yang Liu (刘洋; ICT, Tsinghua, Beijing Academy of Artificial Intelligence), Yang Liu (Peking University), Yang Liu (Samsung Research Center Beijing), Yang Liu (Univ. of Michigan, UC Santa Cruz), Yang Liu (Wilfrid Laurier University)


2022

pdf
End-to-End Segmentation-based News Summarization
Yang Liu | Chenguang Zhu | Michael Zeng
Findings of the Association for Computational Linguistics: ACL 2022

In this paper, we bring a new way of digesting news content by introducing the task of segmenting a news article into multiple sections and generating the corresponding summary to each section. We make two contributions towards this new task. First, we create and make available a dataset, SegNews, consisting of 27k news articles with sections and aligned heading-style section summaries. Second, we propose a novel segmentation-based language generation model adapted from pre-trained language models that can jointly segment a document and produce the summary for each section. Experimental results on SegNews demonstrate that our model can outperform several state-of-the-art sequence-to-sequence generation models for this new task.

pdf
Unsupervised Multi-Granularity Summarization
Ming Zhong | Yang Liu | Suyu Ge | Yuning Mao | Yizhu Jiao | Xingxing Zhang | Yichong Xu | Chenguang Zhu | Michael Zeng | Jiawei Han
Findings of the Association for Computational Linguistics: EMNLP 2022

Text summarization is a user-preference based task, i.e., for one document, users often have different priorities for the summary. As a key aspect of customization in summarization, granularity is used to measure the semantic coverage between the summary and source document. However, developing systems that can generate summaries with customizable semantic coverage is still an under-explored topic. In this paper, we propose the first unsupervised multi-granularity summarization framework, GranuSum. We take events as the basic semantic units of the source documents and propose to rank these events by their salience. We also develop a model to summarize input documents with given events as anchors and hints. By inputting different numbers of events, GranuSum is capable of producing multi-granular summaries in an unsupervised manner. Meanwhile, we annotate a new benchmark GranuDUC that contains multiple summaries at different granularities for each document cluster. Experimental results confirm the substantial superiority of GranuSum on multi-granularity summarization over strong baselines. Furthermore, by exploiting the event information, GranuSum also exhibits state-of-the-art performance under the conventional unsupervised abstractive setting.

pdf
Task Compass: Scaling Multi-task Pre-training with Task Prefix
Zhuosheng Zhang | Shuohang Wang | Yichong Xu | Yuwei Fang | Wenhao Yu | Yang Liu | Hai Zhao | Chenguang Zhu | Michael Zeng
Findings of the Association for Computational Linguistics: EMNLP 2022

Leveraging task-aware annotated data as supervised signals to assist with self-supervised learning on large-scale unlabeled data has become a new trend in pre-training language models. Existing studies show that multi-task learning with large-scale supervised tasks suffers from negative effects across tasks. To tackle the challenge, we propose a task prefix guided multi-task pre-training framework to explore the relationships among tasks. We conduct extensive experiments on 40 datasets, which show that our model can not only serve as the strong foundation backbone for a wide range of tasks but also be feasible as a probing tool for analyzing task relationships. The task relationships reflected by the prefixes align transfer learning performance between tasks. They also suggest directions for data augmentation with complementary tasks, which help our model achieve human-parity results on commonsense reasoning leaderboards. Code is available at https://github.com/cooelf/CompassMTL.

pdf
AdaPrompt: Adaptive Model Training for Prompt-based NLP
Yulong Chen | Yang Liu | Li Dong | Shuohang Wang | Chenguang Zhu | Michael Zeng | Yue Zhang
Findings of the Association for Computational Linguistics: EMNLP 2022

Prompt-based learning, with its capability to tackle zero-shot and few-shot NLP tasks, has gained much attention in the community.The main idea is to bridge the gap between NLP downstream tasks and language modeling (LM), by mapping these tasks into natural language prompts, which are then filled by pre-trained language models (PLMs).However, for prompt learning, there are still two salient gaps between NLP tasks and pretraining.First, prompt information is not necessarily sufficiently present during LM pre-training. Second, task-specific data are not necessarily well represented during pre-training. We address these two issues by proposing AdaPrompt, adaptively retrieving external data for continual pretraining of PLMs by making use of both task and prompt characteristics. In addition, we make use of knowledge in Natural Language Inference models for deriving adaptive verbalizers.Experimental results on five NLP benchmarks show that AdaPrompt can improve over standard PLMs in few-shot settings. In addition, in zero-shot settings, our method outperforms standard prompt-based methods by up to 26.35% relative error reduction.

pdf
Training Data is More Valuable than You Think: A Simple and Effective Method by Retrieving from Training Data
Shuohang Wang | Yichong Xu | Yuwei Fang | Yang Liu | Siqi Sun | Ruochen Xu | Chenguang Zhu | Michael Zeng
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Retrieval-based methods have been shown to be effective in NLP tasks via introducing external knowledge. However, the indexing and retrieving of large-scale corpora bring considerable computational cost. Surprisingly, we found that REtrieving from the traINing datA (REINA) only can lead to significant gains on multiple NLG and NLU tasks. We retrieve the labeled training instances most similar to the input text and then concatenate them with the input to feed into the model to generate the output. Experimental results show that this simple method can achieve significantly better performance on a variety of NLU and NLG tasks, including summarization, machine translation, language modeling, and question answering tasks. For instance, our proposed method achieved state-of-the-art results on XSum, BigPatent, and CommonsenseQA. Our code is released, https://github.com/microsoft/REINA .

pdf
Towards a Unified Multi-Dimensional Evaluator for Text Generation
Ming Zhong | Yang Liu | Da Yin | Yuning Mao | Yizhu Jiao | Pengfei Liu | Chenguang Zhu | Heng Ji | Jiawei Han
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Multi-dimensional evaluation is the dominant paradigm for human evaluation in Natural Language Generation (NLG), i.e., evaluating the generated text from multiple explainable dimensions, such as coherence and fluency. However, automatic evaluation in NLG is still dominated by similarity-based metrics, and we lack a reliable framework for a more comprehensive evaluation of advanced models. In this paper, we propose a unified multi-dimensional evaluator UniEval for NLG. We re-frame NLG evaluation as a Boolean Question Answering (QA) task, and by guiding the model with different questions, we can use one evaluator to evaluate from multiple dimensions. Furthermore, thanks to the unified Boolean QA format, we are able to introduce an intermediate learning phase that enables UniEval to incorporate external knowledge from multiple related tasks and gain further improvement. Experiments on three typical NLG tasks show that UniEval correlates substantially better with human judgments than existing metrics. Specifically, compared to the top-performing unified evaluators, UniEval achieves a 23% higher correlation on text summarization, and over 43% on dialogue response generation. Also, UniEval demonstrates a strong zero-shot learning ability for unseen evaluation dimensions and tasks. Source code, data, and all pre-trained evaluators are available at https://github.com/maszhongming/UniEval.

pdf
ExPUNations: Augmenting Puns with Keywords and Explanations
Jiao Sun | Anjali Narayan-Chen | Shereen Oraby | Alessandra Cervone | Tagyoung Chung | Jing Huang | Yang Liu | Nanyun Peng
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

The tasks of humor understanding and generation are challenging and subjective even for humans, requiring commonsense and real-world knowledge to master. Puns, in particular, add the challenge of fusing that knowledge with the ability to interpret lexical-semantic ambiguity. In this paper, we present the ExPUNations (ExPUN) dataset, in which we augment an existing dataset of puns with detailed crowdsourced annotations of keywords denoting the most distinctive words that make the text funny, pun explanations describing why the text is funny, and fine-grained funniness ratings. This is the first humor dataset with such extensive and fine-grained annotations specifically for puns. Based on these annotations, we propose two tasks: explanation generation to aid with pun classification and keyword-conditioned pun generation, to challenge the current state-of-the-art natural language understanding and generation models’ ability to understand and generate humor. We showcase that the annotated keywords we collect are helpful for generating better novel humorous texts in human evaluation, and that our natural language explanations can be leveraged to improve both the accuracy and robustness of humor classifiers.

2021

pdf
Want To Reduce Labeling Cost? GPT-3 Can Help
Shuohang Wang | Yang Liu | Yichong Xu | Chenguang Zhu | Michael Zeng
Findings of the Association for Computational Linguistics: EMNLP 2021

Data annotation is a time-consuming and labor-intensive process for many NLP tasks. Although there exist various methods to produce pseudo data labels, they are often task-specific and require a decent amount of labeled data to start with. Recently, the immense language model GPT-3 with 170 billion parameters has achieved tremendous improvement across many few-shot learning tasks. In this paper, we explore ways to leverage GPT-3 as a low-cost data labeler to train other models. We find that to make the downstream model achieve the same performance on a variety of NLU and NLG tasks, it costs 50% to 96% less to use labels from GPT-3 than using labels from humans. Furthermore, we propose a novel framework of combining pseudo labels from GPT-3 with human labels, which leads to even better performance. These results present a cost-effective data labeling methodology that is generalizable to many practical applications.

pdf
Modeling Entity Knowledge for Fact Verification
Yang Liu | Chenguang Zhu | Michael Zeng
Proceedings of the Fourth Workshop on Fact Extraction and VERification (FEVER)

Fact verification is a challenging task of identifying the truthfulness of given claims based on the retrieval of relevant evidence texts. Many claims require understanding and reasoning over external entity information for precise verification. In this paper, we propose a novel fact verification model using entity knowledge to enhance its performance. We retrieve descriptive text from Wikipedia for each entity, and then encode these descriptions by a smaller lightweight network to be fed into the main verification model. Furthermore, we boost model performance by adopting and predicting the relatedness between the claim and each evidence as additional signals. We demonstrate experimentally on a large-scale benchmark dataset FEVER that our framework achieves competitive results with a FEVER score of 72.89% on the test set.