Yufei Tian


AmbiPun: Generating Humorous Puns with Ambiguous Context
Anirudh Mittal | Yufei Tian | Nanyun Peng
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

In this paper, we propose a simple yet effective way to generate pun sentences that does not require any training on existing puns. Our approach is inspired by humor theories that ambiguity comes from the context rather than the pun word itself. Given a pair of definitions of a pun word, our model first produces a list of related concepts through a reverse dictionary. We then utilize one-shot GPT3 to generate context words and then generate puns incorporating context words from both concepts. Human evaluation shows that our method successfully generates pun 52% of the time, outperforming well-crafted baselines and the state-of-the-art models by a large margin.

Go Back in Time: Generating Flashbacks in Stories with Event Temporal Prompts
Rujun Han | Hong Chen | Yufei Tian | Nanyun Peng
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Stories or narratives are comprised of a sequence of events. To compose interesting stories, professional writers often leverage a creative writing technique called *flashback* that inserts past events into current storylines as we commonly observe in novels and plays. However, it is challenging for machines to generate *flashback* as it requires a solid understanding of event **temporal order** (e.g. *feeling hungry* before *eat*, not vice versa), and the creativity to arrange storylines so that earlier events do not always appear first in **narrative order**. Two major issues in existing systems that exacerbate the challenges: 1) temporal bias in pertaining and story datasets that leads to monotonic event temporal orders; 2) lack of explicit guidance that helps machines decide where to insert *flashbacks*. We propose to address these issues using structured storylines to encode events and their pair-wise temporal relations (before, after and vague) as **temporal prompts** that guide how stories should unfold temporally. We leverage a Plan-and-Write framework enhanced by reinforcement learning to generate storylines and stories end-to-end. Evaluation results show that the proposed method can generate more interesting stories with *flashbacks* while maintaining textual diversity, fluency, and temporal coherence.

Zero-shot Sonnet Generation with Discourse-level Planning and Aesthetics Features
Yufei Tian | Nanyun Peng
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Poetry generation, and creative language generation in general, usually suffers from the lack of large training data. In this paper, we present a novel framework to generate sonnets that does not require training on poems. We design a hierarchical framework which plans the poem sketch before decoding. Specifically, a content planning module is trained on non-poetic texts to obtain discourse-level coherence; then a rhyme module generates rhyme words and a polishing module introduces imagery and similes for aesthetics purposes. Finally, we design a constrained decoding algorithm to impose the meter-and-rhyme constraint of the generated sonnets. Automatic and human evaluation show that our multi-stage approach without training on poem corpora generates more coherent, poetic, and creative sonnets than several strong baselines.

A Unified Framework for Pun Generation with Humor Principles
Yufei Tian | Divyanshu Sheth | Nanyun Peng
Findings of the Association for Computational Linguistics: EMNLP 2022

We propose a unified framework to generate both homophonic and homographic puns to resolve the split-up in existing works. Specifically, we incorporate three linguistic attributes of puns to the language models: ambiguity, distinctiveness, and surprise. Our framework consists of three parts: 1) a context words/phrases selector to promote the aforementioned attributes, 2) a generation model trained on non-pun sentences to incorporate the context words/phrases into the generation output, and 3) a label predictor that learns the structure of puns which is used to steer the generation model at inference time. Evaluation results on both pun types demonstrate the efficacy of our model over strong baselines.

Paraphrase Generation as Unsupervised Machine Translation
Xiaofei Sun | Yufei Tian | Yuxian Meng | Nanyun Peng | Fei Wu | Jiwei Li | Chun Fan
Proceedings of the 29th International Conference on Computational Linguistics

In this paper, we propose a new paradigm for paraphrase generation by treating the task as unsupervised machine translation (UMT) based on the assumption that there must be pairs of sentences expressing the same meaning in a large-scale unlabeled monolingual corpus. The proposed paradigm first splits a large unlabeled corpus into multiple clusters, and trains multiple UMT models using pairs of these clusters. Then based on the paraphrase pairs produced by these UMT models, a unified surrogate model can be trained to serve as the final model to generate paraphrases, which can be directly used for test in the unsupervised setup, or be finetuned on labeled datasets in the supervised setup. The proposed method offers merits over machine-translation-based paraphrase generation methods, as it avoids reliance on bilingual sentence pairs. It also allows human intervene with the model so that more diverse paraphrases can be generated using different filtering criteria. Extensive experiments on existing paraphrase dataset for both the supervised and unsupervised setups demonstrate the effectiveness the proposed paradigm.


Identifying Distributional Perspectives from Colingual Groups
Yufei Tian | Tuhin Chakrabarty | Fred Morstatter | Nanyun Peng
Proceedings of the Ninth International Workshop on Natural Language Processing for Social Media

Discrepancies exist among different cultures or languages. A lack of mutual understanding among different colingual groups about the perspectives on specific values or events may lead to uninformed decisions or biased opinions. Thus, automatically understanding the group perspectives can provide essential back-ground for many natural language processing tasks. In this paper, we study colingual groups and use language corpora as a proxy to identify their distributional perspectives. We present a novel computational approach to learn shared understandings, and benchmark our method by building culturally-aware models for the English, Chinese, and Japanese languages. Ona held out set of diverse topics, including marriage, corruption, democracy, etc., our model achieves high correlation with human judgements regarding intra-group values and inter-group differences

HypoGen: Hyperbole Generation with Commonsense and Counterfactual Knowledge
Yufei Tian | Arvind krishna Sridhar | Nanyun Peng
Findings of the Association for Computational Linguistics: EMNLP 2021

A hyperbole is an intentional and creative exaggeration not to be taken literally. Despite its ubiquity in daily life, the computational explorations of hyperboles are scarce. In this paper, we tackle the under-explored and challenging task: sentence-level hyperbole generation. We start with a representative syntactic pattern for intensification and systematically study the semantic (commonsense and counterfactual) relationships between each component in such hyperboles. We then leverage commonsense and counterfactual inference to generate hyperbole candidates based on our findings from the pattern, and train neural classifiers to rank and select high-quality hyperboles. Automatic and human evaluations show that our generation method is able to generate hyperboles with high success rate, intensity, funniness, and creativity.