Xiangyu Peng

2024

Advancements in large language models (LLMs) are revolutionizing interactive game design, enabling dynamic plotlines and interactions between players and non-player characters (NPCs). However, LLMs may exhibit flaws such as hallucinations, forgetfulness, or misinterpretations of prompts, causing logical inconsistencies and unexpected deviations from intended designs. Automated techniques for detecting such game bugs are still lacking. To address this, we propose a systematic LLM-based method for automatically identifying such bugs from player game logs, eliminating the need for collecting additional data such as post-play surveys. Applied to a text-based game DejaBoom!, our approach effectively identifies bugs inherent in LLM-powered interactive games, surpassing unstructured LLM-powered bug-catching methods and filling the gap in automated detection of logical and design flaws.

2022

pdf abs
Inferring the Reader: Guiding Automated Story Generation with Commonsense Reasoning
Xiangyu Peng | Siyan Li | Sarah Wiegreffe | Mark Riedl
Findings of the Association for Computational Linguistics: EMNLP 2022

Transformer-based language model approaches to automated story generation currently provide state-of-the-art results. However, they still suffer from plot incoherence when generatingnarratives over time, and critically lack basiccommonsense reasoning. Furthermore, existing methods generally focus only on single-character stories, or fail to track charactersat all. To improve the coherence of generated narratives and to expand the scope ofcharacter-centric narrative generation, we introduce Commonsense-inference Augmentedneural StoryTelling (CAST), a framework forintroducing commonsense reasoning into thegeneration process with the option to model theinteraction between multiple characters. Wefind that our CAST method produces significantly more coherent, on-topic, enjoyable andfluent stories than existing models in both thesingle-character and two-character settings inthree storytelling domains.

Automated storytelling has long captured the attention of researchers for the ubiquity of narratives in everyday life. However, it is challenging to maintain coherence and stay on-topictoward a specific ending when generating narratives with neural language models. In this paper, we introduce Story generation with ReaderModels (StoRM), a framework in which areader model is used to reason about the storyshould progress. A reader model infers whata human reader believes about the concepts,entities, and relations about the fictional storyworld. We show how an explicit reader modelrepresented as a knowledge graph affords the storycoherence and provides controllability in theform of achieving a given story world stategoal. Experiments show that our model produces significantly more coherent and on-topicstories, outperforming baselines in dimensionsincluding plot plausibility and staying on topic

2021

pdf abs
Automatic Story Generation: Challenges and Attempts
Amal Alabdulkarim | Siyan Li | Xiangyu Peng
Proceedings of the Third Workshop on Narrative Understanding

Automated storytelling has long captured the attention of researchers for the ubiquity of narratives in everyday life. The best human-crafted stories exhibit coherent plot, strong characters, and adherence to genres, attributes that current states-of-the-art still struggle to produce, even using transformer architectures. In this paper, we analyze works in story generation that utilize machine learning approaches to (1) address story generation controllability, (2) incorporate commonsense knowledge, (3) infer reasonable character actions, and (4) generate creative language.

2020

pdf abs
Reducing Non-Normative Text Generation from Language Models
Xiangyu Peng | Siyan Li | Spencer Frazier | Mark Riedl
Proceedings of the 13th International Conference on Natural Language Generation

Large-scale, transformer-based language models such as GPT-2 are pretrained on diverse corpora scraped from the internet. Consequently, they are prone to generating non-normative text (i.e. in violation of social norms). We introduce a technique for fine-tuning GPT-2, using a policy gradient reinforcement learning technique and a normative text classifier to produce reward and punishment values. We evaluate our technique on five data sets using automated and human participant experiments. The normative text classifier is 81-90% accurate when compared to gold-standard human judgements of normative and non-normative generated text. Our normative fine-tuning technique is able to reduce non-normative text by 27-61%, depending on the data set.

Co-authors