Chenguang Zhu


2021

pdf bib
Modeling Entity Knowledge for Fact Verification
Yang Liu | Chenguang Zhu | Michael Zeng
Proceedings of the Fourth Workshop on Fact Extraction and VERification (FEVER)

Fact verification is a challenging task of identifying the truthfulness of given claims based on the retrieval of relevant evidence texts. Many claims require understanding and reasoning over external entity information for precise verification. In this paper, we propose a novel fact verification model using entity knowledge to enhance its performance. We retrieve descriptive text from Wikipedia for each entity, and then encode these descriptions by a smaller lightweight network to be fed into the main verification model. Furthermore, we boost model performance by adopting and predicting the relatedness between the claim and each evidence as additional signals. We demonstrate experimentally on a large-scale benchmark dataset FEVER that our framework achieves competitive results with a FEVER score of 72.89% on the test set.

pdf bib
Fusing Context Into Knowledge Graph for Commonsense Question Answering
Yichong Xu | Chenguang Zhu | Ruochen Xu | Yang Liu | Michael Zeng | Xuedong Huang
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Retrieval Enhanced Model for Commonsense Generation
Han Wang | Yang Liu | Chenguang Zhu | Linjun Shou | Ming Gong | Yichong Xu | Michael Zeng
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Want To Reduce Labeling Cost? GPT-3 Can Help
Shuohang Wang | Yang Liu | Yichong Xu | Chenguang Zhu | Michael Zeng
Findings of the Association for Computational Linguistics: EMNLP 2021

Data annotation is a time-consuming and labor-intensive process for many NLP tasks. Although there exist various methods to produce pseudo data labels, they are often task-specific and require a decent amount of labeled data to start with. Recently, the immense language model GPT-3 with 170 billion parameters has achieved tremendous improvement across many few-shot learning tasks. In this paper, we explore ways to leverage GPT-3 as a low-cost data labeler to train other models. We find that to make the downstream model achieve the same performance on a variety of NLU and NLG tasks, it costs 50% to 96% less to use labels from GPT-3 than using labels from humans. Furthermore, we propose a novel framework of combining pseudo labels from GPT-3 with human labels, which leads to even better performance. These results present a cost-effective data labeling methodology that is generalizable to many practical applications.

pdf bib
An Exploratory Study on Long Dialogue Summarization: What Works and What’s Next
Yusen Zhang | Ansong Ni | Tao Yu | Rui Zhang | Chenguang Zhu | Budhaditya Deb | Asli Celikyilmaz | Ahmed Hassan Awadallah | Dragomir Radev
Findings of the Association for Computational Linguistics: EMNLP 2021

Dialogue summarization helps readers capture salient information from long conversations in meetings, interviews, and TV series. However, real-world dialogues pose a great challenge to current summarization models, as the dialogue length typically exceeds the input limits imposed by recent transformer-based pre-trained models, and the interactive nature of dialogues makes relevant information more context-dependent and sparsely distributed than news articles. In this work, we perform a comprehensive study on long dialogue summarization by investigating three strategies to deal with the lengthy input problem and locate relevant information: (1) extended transformer models such as Longformer, (2) retrieve-then-summarize pipeline models with several dialogue utterance retrieval methods, and (3) hierarchical dialogue encoding models such as HMNet. Our experimental results on three long dialogue datasets (QMSum, MediaSum, SummScreen) show that the retrieve-then-summarize pipeline models yield the best performance. We also demonstrate that the summary quality can be further improved with a stronger retrieval model and pretraining on proper external summarization datasets.

pdf bib
Enhancing Factual Consistency of Abstractive Summarization
Chenguang Zhu | William Hinthorn | Ruochen Xu | Qingkai Zeng | Michael Zeng | Xuedong Huang | Meng Jiang
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Automatic abstractive summaries are found to often distort or fabricate facts in the article. This inconsistency between summary and original text has seriously impacted its applicability. We propose a fact-aware summarization model FASum to extract and integrate factual relations into the summary generation process via graph attention. We then design a factual corrector model FC to automatically correct factual errors from summaries generated by existing systems. Empirical results show that the fact-aware summarization can produce abstractive summaries with higher factual consistency compared with existing systems, and the correction model improves the factual consistency of given summaries via modifying only a few keywords.

pdf bib
SPLAT: Speech-Language Joint Pre-Training for Spoken Language Understanding
Yu-An Chung | Chenguang Zhu | Michael Zeng
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Spoken language understanding (SLU) requires a model to analyze input acoustic signal to understand its linguistic content and make predictions. To boost the models’ performance, various pre-training methods have been proposed to learn rich representations from large-scale unannotated speech and text. However, the inherent disparities between the two modalities necessitate a mutual analysis. In this paper, we propose a novel semi-supervised learning framework, SPLAT, to jointly pre-train the speech and language modules. Besides conducting a self-supervised masked language modeling task on the two individual modules using unpaired speech and text, SPLAT aligns representations from the two modules in a shared latent space using a small amount of paired speech and text. Thus, during fine-tuning, the speech module alone can produce representations carrying both acoustic information and contextual semantic knowledge of an input acoustic signal. Experimental results verify the effectiveness of our approach on various SLU tasks. For example, SPLAT improves the previous state-of-the-art performance on the Spoken SQuAD dataset by more than 10%.

pdf bib
MediaSum: A Large-scale Media Interview Dataset for Dialogue Summarization
Chenguang Zhu | Yang Liu | Jie Mei | Michael Zeng
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

This paper introduces MediaSum, a large-scale media interview dataset consisting of 463.6K transcripts with abstractive summaries. To create this dataset, we collect interview transcripts from NPR and CNN and employ the overview and topic descriptions as summaries. Compared with existing public corpora for dialogue summarization, our dataset is an order of magnitude larger and contains complex multi-party conversations from multiple domains. We conduct statistical analysis to demonstrate the unique positional bias exhibited in the transcripts of televised and radioed interviews. We also show that MediaSum can be used in transfer learning to improve a model’s performance on other dialogue summarization tasks.

pdf bib
RADDLE: An Evaluation Benchmark and Analysis Platform for Robust Task-oriented Dialog Systems
Baolin Peng | Chunyuan Li | Zhu Zhang | Chenguang Zhu | Jinchao Li | Jianfeng Gao
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

For task-oriented dialog systems to be maximally useful, it must be able to process conversations in a way that is (1) generalizable with a small number of training examples for new task domains, and (2) robust to user input in various styles, modalities, or domains. In pursuit of these goals, we introduce the RADDLE benchmark, a collection of corpora and tools for evaluating the performance of models across a diverse set of domains. By including tasks with limited training data, RADDLE is designed to favor and encourage models with a strong generalization ability. RADDLE also includes a diagnostic checklist that facilitates detailed robustness analysis in aspects such as language variations, speech errors, unseen entities, and out-of-domain utterances. We evaluate recent state-of-the-art systems based on pre-training and fine-tuning, and find that grounded pre-training on heterogeneous dialog corpora performs better than training a separate model per domain. Adversarial training is also proposed to improve model robustness against noisy inputs. Overall, existing models are less than satisfactory in robustness evaluation, which suggests opportunities for future improvement.

pdf bib
Injecting Entity Types into Entity-Guided Text Generation
Xiangyu Dong | Wenhao Yu | Chenguang Zhu | Meng Jiang
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Recent successes in deep generative modeling have led to significant advances in natural language generation (NLG). Incorporating entities into neural generation models has demonstrated great improvements by assisting to infer the summary topic and to generate coherent content. To enhance the role of entity in NLG, in this paper, we aim to model the entity type in the decoding phase to generate contextual words accurately. We develop a novel NLG model to produce a target sequence based on a given list of entities. Our model has a multi-step decoder that injects the entity types into the process of entity mention generation. Experiments on two public news datasets demonstrate type injection performs better than existing type embedding concatenation baselines.

pdf bib
Sentence-Permuted Paragraph Generation
Wenhao Yu | Chenguang Zhu | Tong Zhao | Zhichun Guo | Meng Jiang
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Generating paragraphs of diverse contents is important in many applications. Existing generation models produce similar contents from homogenized contexts due to the fixed left-to-right sentence order. Our idea is permuting the sentence orders to improve the content diversity of multi-sentence paragraph. We propose a novel framework PermGen whose objective is to maximize the expected log-likelihood of output paragraph distributions with respect to all possible sentence orders. PermGen uses hierarchical positional embedding and designs new procedures for training, and decoding in the sentence-permuted generation. Experiments on three paragraph generation benchmarks demonstrate PermGen generates more diverse outputs with a higher quality than existing models.

2020

pdf bib
Mixed-Lingual Pre-training for Cross-lingual Summarization
Ruochen Xu | Chenguang Zhu | Yu Shi | Michael Zeng | Xuedong Huang
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

Cross-lingual Summarization (CLS) aims at producing a summary in the target language for an article in the source language. Traditional solutions employ a two-step approach, i.e. translate -> summarize or summarize -> translate. Recently, end-to-end models have achieved better results, but these approaches are mostly limited by their dependence on large-scale labeled data. We propose a solution based on mixed-lingual pre-training that leverages both cross-lingual tasks such as translation and monolingual tasks like masked language models. Thus, our model can leverage the massive monolingual data to enhance its modeling of language. Moreover, the architecture has no task-specific components, which saves memory and increases optimization efficiency. We show in experiments that this pre-training scheme can effectively boost the performance of cross-lingual summarization. In NCLS dataset, our model achieves an improvement of 2.82 (English to Chinese) and 1.15 (Chinese to English) ROUGE-1 scores over state-of-the-art results.

pdf bib
Few-shot Natural Language Generation for Task-Oriented Dialog
Baolin Peng | Chenguang Zhu | Chunyuan Li | Xiujun Li | Jinchao Li | Michael Zeng | Jianfeng Gao
Findings of the Association for Computational Linguistics: EMNLP 2020

As a crucial component in task-oriented dialog systems, the Natural Language Generation (NLG) module converts a dialog act represented in a semantic form into a response in natural language. The success of traditional template-based or statistical models typically relies on heavily annotated data, which is infeasible for new domains. Therefore, it is pivotal for an NLG system to generalize well with limited labelled data in real applications. To this end, we present FewshotWOZ, the first NLG benchmark to simulate the few-shot learning setting in task-oriented dialog systems. Further, we develop the SC-GPT model. It is pre-trained on a large set of annotated NLG corpus to acquire the controllable generation ability, and fine-tuned with only a few domain-specific labels to adapt to new domains. Experiments on FewshotWOZ and the large Multi-Domain-WOZ datasets show that the proposed SC-GPT significantly outperforms existing methods, measured by various automatic metrics and human evaluations.

pdf bib
A Hierarchical Network for Abstractive Meeting Summarization with Cross-Domain Pretraining
Chenguang Zhu | Ruochen Xu | Michael Zeng | Xuedong Huang
Findings of the Association for Computational Linguistics: EMNLP 2020

With the abundance of automatic meeting transcripts, meeting summarization is of great interest to both participants and other parties. Traditional methods of summarizing meetings depend on complex multi-step pipelines that make joint optimization intractable. Meanwhile, there are a handful of deep neural models for text summarization and dialogue systems. However, the semantic structure and styles of meeting transcripts are quite different from articles and conversations. In this paper, we propose a novel abstractive summary network that adapts to the meeting scenario. We design a hierarchical structure to accommodate long meeting transcripts and a role vector to depict the difference among speakers. Furthermore, due to the inadequacy of meeting summary data, we pretrain the model on large-scale news summary data. Empirical results show that our model outperforms previous approaches in both automatic metrics and human evaluation. For example, on ICSI dataset, the ROUGE-1 score increases from 34.66% to 46.28%.

pdf bib
TED: A Pretrained Unsupervised Summarization Model with Theme Modeling and Denoising
Ziyi Yang | Chenguang Zhu | Robert Gmyr | Michael Zeng | Xuedong Huang | Eric Darve
Findings of the Association for Computational Linguistics: EMNLP 2020

Text summarization aims to extract essential information from a piece of text and transform the text into a concise version. Existing unsupervised abstractive summarization models leverage recurrent neural networks framework while the recently proposed transformer exhibits much more capability. Moreover, most of previous summarization models ignore abundant unlabeled corpora resources available for pretraining. In order to address these issues, we propose TED, a transformer-based unsupervised abstractive summarization system with pretraining on large-scale data. We first leverage the lead bias in news articles to pretrain the model on millions of unlabeled corpora. Next, we finetune TED on target domains through theme modeling and a denoising autoencoder to enhance the quality of generated summaries. Notably, TED outperforms all unsupervised abstractive baselines on NYT, CNN/DM and English Gigaword datasets with various document styles. Further analysis shows that the summaries generated by TED are highly abstractive, and each component in the objective function of TED is highly effective.

pdf bib
Boosting Naturalness of Language in Task-oriented Dialogues via Adversarial Training
Chenguang Zhu
Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue

The natural language generation (NLG) module in a task-oriented dialogue system produces user-facing utterances conveying required information. Thus, it is critical for the generated response to be natural and fluent. We propose to integrate adversarial training to produce more human-like responses. The model uses Straight-Through Gumbel-Softmax estimator for gradient computation. We also propose a two-stage training scheme to boost performance. Empirical results show that the adversarial training can effectively improve the quality of language generation in both automatic and human evaluations. For example, in the RNN-LG Restaurant dataset, our model AdvNLG outperforms the previous state-of-the-art result by 3.6% in BLEU.

2019

pdf bib
Embedding Imputation with Grounded Language Information
Ziyi Yang | Chenguang Zhu | Vin Sachidananda | Eric Darve
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Due to the ubiquitous use of embeddings as input representations for a wide range of natural language tasks, imputation of embeddings for rare and unseen words is a critical problem in language processing. Embedding imputation involves learning representations for rare or unseen words during the training of an embedding model, often in a post-hoc manner. In this paper, we propose an approach for embedding imputation which uses grounded information in the form of a knowledge graph. This is in contrast to existing approaches which typically make use of vector space properties or subword information. We propose an online method to construct a graph from grounded information and design an algorithm to map from the resulting graphical structure to the space of the pre-trained embeddings. Finally, we evaluate our approach on a range of rare and unseen word tasks across various domains and show that our model can learn better representations. For example, on the Card-660 task our method improves Pearson’s and Spearman’s correlation coefficients upon the state-of-the-art by 11% and 17.8% respectively using GloVe embeddings.

pdf bib
SIM: A Slot-Independent Neural Model for Dialogue State Tracking
Chenguang Zhu | Michael Zeng | Xuedong Huang
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue

Dialogue state tracking is an important component in task-oriented dialogue systems to identify users’ goals and requests as a dialogue proceeds. However, as most previous models are dependent on dialogue slots, the model complexity soars when the number of slots increases. In this paper, we put forward a slot-independent neural model (SIM) to track dialogue states while keeping the model complexity invariant to the number of dialogue slots. The model utilizes attention mechanisms between user utterance and system actions. SIM achieves state-of-the-art results on WoZ and DSTC2 tasks, with only 20% of the model size of previous models.

pdf bib
Learning to Attend On Essential Terms: An Enhanced Retriever-Reader Model for Open-domain Question Answering
Jianmo Ni | Chenguang Zhu | Weizhu Chen | Julian McAuley
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Open-domain question answering remains a challenging task as it requires models that are capable of understanding questions and answers, collecting useful information, and reasoning over evidence. Previous work typically formulates this task as a reading comprehension or entailment problem given evidence retrieved from search engines. However, existing techniques struggle to retrieve indirectly related evidence when no directly related evidence is provided, especially for complex questions where it is hard to parse precisely what the question asks. In this paper we propose a retriever-reader model that learns to attend on essential terms during the question answering process. We build (1) an essential term selector which first identifies the most important words in a question, then reformulates the query and searches for related evidence; and (2) an enhanced reader that distinguishes between essential terms and distracting words to predict the answer. We evaluate our model on multiple open-domain QA datasets, notably achieving the level of the state-of-the-art on the AI2 Reasoning Challenge (ARC) dataset.

pdf bib
Parameter-free Sentence Embedding via Orthogonal Basis
Ziyi Yang | Chenguang Zhu | Weizhu Chen
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We propose a simple and robust non-parameterized approach for building sentence representations. Inspired by the Gram-Schmidt Process in geometric theory, we build an orthogonal basis of the subspace spanned by a word and its surrounding context in a sentence. We model the semantic meaning of a word in a sentence based on two aspects. One is its relatedness to the word vector subspace already spanned by its contextual words. The other is the word’s novel semantic meaning which shall be introduced as a new basis vector perpendicular to this existing subspace. Following this motivation, we develop an innovative method based on orthogonal basis to combine pre-trained word embeddings into sentence representations. This approach requires zero parameters, along with efficient inference performance. We evaluate our approach on 11 downstream NLP tasks. Our model shows superior performance compared with non-parameterized alternatives and it is competitive to other approaches relying on either large amounts of labelled data or prolonged training time.

pdf bib
Multi-task Learning for Natural Language Generation in Task-Oriented Dialogue
Chenguang Zhu | Michael Zeng | Xuedong Huang
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

In task-oriented dialogues, Natural Language Generation (NLG) is the final yet crucial step to produce user-facing system utterances. The result of NLG is directly related to the perceived quality and usability of a dialogue system. While most existing systems provide semantically correct responses given goals to present, they struggle to match the variation and fluency in the human language. In this paper, we propose a novel multi-task learning framework, NLG-LM, for natural language generation. In addition to generating high-quality responses conveying the required information, it also explicitly targets for naturalness in generated responses via an unconditioned language model. This can significantly improve the learning of style and variation in human language. Empirical results show that this multi-task learning framework outperforms previous models across multiple datasets. For example, it improves the previous best BLEU score on the E2E-NLG dataset by 2.2%, and on the Laptop dataset by 6.1%.