Jian Guan


2021

pdf bib
Stylized Story Generation with Style-Guided Planning
Xiangzhe Kong | Jialiang Huang | Ziquan Tung | Jian Guan | Minlie Huang
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Long Text Generation by Modeling Sentence-Level and Discourse-Level Coherence
Jian Guan | Xiaoxi Mao | Changjie Fan | Zitao Liu | Wenbiao Ding | Minlie Huang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Generating long and coherent text is an important but challenging task, particularly for open-ended language generation tasks such as story generation. Despite the success in modeling intra-sentence coherence, existing generation models (e.g., BART) still struggle to maintain a coherent event sequence throughout the generated text. We conjecture that this is because of the difficulty for the decoder to capture the high-level semantics and discourse structures in the context beyond token-level co-occurrence. In this paper, we propose a long text generation model, which can represent the prefix sentences at sentence level and discourse level in the decoding process. To this end, we propose two pretraining objectives to learn the representations by predicting inter-sentence semantic similarity and distinguishing between normal and shuffled sentence orders. Extensive experiments show that our model can generate more coherent texts than state-of-the-art baselines.

pdf bib
OpenMEVA: A Benchmark for Evaluating Open-ended Story Generation Metrics
Jian Guan | Zhexin Zhang | Zhuoer Feng | Zitao Liu | Wenbiao Ding | Xiaoxi Mao | Changjie Fan | Minlie Huang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Automatic metrics are essential for developing natural language generation (NLG) models, particularly for open-ended language generation tasks such as story generation. However, existing automatic metrics are observed to correlate poorly with human evaluation. The lack of standardized benchmark datasets makes it difficult to fully evaluate the capabilities of a metric and fairly compare different metrics. Therefore, we propose OpenMEVA, a benchmark for evaluating open-ended story generation metrics. OpenMEVA provides a comprehensive test suite to assess the capabilities of metrics, including (a) the correlation with human judgments, (b) the generalization to different model outputs and datasets, (c) the ability to judge story coherence, and (d) the robustness to perturbations. To this end, OpenMEVA includes both manually annotated stories and auto-constructed test examples. We evaluate existing metrics on OpenMEVA and observe that they have poor correlation with human judgments, fail to recognize discourse-level incoherence, and lack inferential knowledge (e.g., causal order between events), the generalization ability and robustness. Our study presents insights for developing NLG models and metrics in further research.

2020

pdf bib
UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation
Jian Guan | Minlie Huang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Despite the success of existing referenced metrics (e.g., BLEU and MoverScore), they correlate poorly with human judgments for open-ended text generation including story or dialog generation because of the notorious one-to-many issue: there are many plausible outputs for the same input, which may differ substantially in literal or semantics from the limited number of given references. To alleviate this issue, we propose UNION, a learnable UNreferenced metrIc for evaluating Open-eNded story generation, which measures the quality of a generated story without any reference. Built on top of BERT, UNION is trained to distinguish human-written stories from negative samples and recover the perturbation in negative stories. We propose an approach of constructing negative samples by mimicking the errors commonly observed in existing NLG models, including repeated plots, conflicting logic, and long-range incoherence. Experiments on two story datasets demonstrate that UNION is a reliable measure for evaluating the quality of generated stories, which correlates better with human judgments and is more generalizable than existing state-of-the-art metrics.

pdf bib
A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation
Jian Guan | Fei Huang | Zhihao Zhao | Xiaoyan Zhu | Minlie Huang
Transactions of the Association for Computational Linguistics, Volume 8

Story generation, namely, generating a reasonable story from a leading context, is an important but challenging task. In spite of the success in modeling fluency and local coherence, existing neural language generation models (e.g., GPT-2) still suffer from repetition, logic conflicts, and lack of long-range coherence in generated stories. We conjecture that this is because of the difficulty of associating relevant commonsense knowledge, understanding the causal relationships, and planning entities and events with proper temporal order. In this paper, we devise a knowledge-enhanced pretraining model for commonsense story generation. We propose to utilize commonsense knowledge from external knowledge bases to generate reasonable stories. To further capture the causal and temporal dependencies between the sentences in a reasonable story, we use multi-task learning, which combines a discriminative objective to distinguish true and fake stories during fine-tuning. Automatic and manual evaluation shows that our model can generate more reasonable stories than state-of-the-art baselines, particularly in terms of logic and global coherence.

2018

pdf bib
Generating Informative Responses with Controlled Sentence Function
Pei Ke | Jian Guan | Minlie Huang | Xiaoyan Zhu
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Sentence function is a significant factor to achieve the purpose of the speaker, which, however, has not been touched in large-scale conversation generation so far. In this paper, we present a model to generate informative responses with controlled sentence function. Our model utilizes a continuous latent variable to capture various word patterns that realize the expected sentence function, and introduces a type controller to deal with the compatibility of controlling sentence function and generating informative content. Conditioned on the latent variable, the type controller determines the type (i.e., function-related, topic, and ordinary word) of a word to be generated at each decoding position. Experiments show that our model outperforms state-of-the-art baselines, and it has the ability to generate responses with both controlled sentence function and informative content.