John Chung

Also published as: John Joon Young Chung

2025

pdf bib abs
LLMs Behind the Scenes: Enabling Narrative Scene Illustration
Melissa Roemmele | John Joon Young Chung | Taewook Kim | Yuqian Sun | Alex Calderwood | Max Kreminski
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Generative AI has established the opportunity to readily transform content from one medium to another. This capability is especially powerful for storytelling, where visual illustrations can illuminate a story originally expressed in text. In this paper, we focus on the task of narrative scene illustration, which involves automatically generating an image depicting a scene in a story. Motivated by recent progress on text-to-image models, we consider a pipeline that uses LLMs as an interface for prompting text-to-image models to generate scene illustrations given raw story text. We apply variations of this pipeline to a prominent story corpus in order to synthesize illustrations for scenes in these stories. We conduct a human annotation task to obtain pairwise quality judgments for these illustrations. The outcome of this process is the SceneIllustrations dataset, which we release as a new resource for future work on cross-modal narrative transformation. Through our analysis of this dataset and experiments modeling illustration quality, we demonstrate that LLMs can effectively verbalize scene knowledge implicitly evoked by story text. Moreover, this capability is impactful for generating and evaluating illustrations.

pdf bib
Proceedings of the Fourth Workshop on Intelligent and Interactive Writing Assistants (In2Writing 2025)
Vishakh Padmakumar | Katy Gero | Thiemo Wambsganss | Sarah Sterman | Ting-Hao Huang | David Zhou | John Chung
Proceedings of the Fourth Workshop on Intelligent and Interactive Writing Assistants (In2Writing 2025)

2023

pdf bib abs
Increasing Diversity While Maintaining Accuracy: Text Data Generation with Large Language Models and Human Interventions
John Chung | Ece Kamar | Saleema Amershi
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Large language models (LLMs) can be used to generate text data for training and evaluating other models. However, creating high-quality datasets with LLMs can be challenging. In this work, we explore human-AI partnerships to facilitate high diversity and accuracy in LLM-based text data generation. We first examine two approaches to diversify text generation: 1) logit suppression, which minimizes the generation of languages that have already been frequently generated, and 2) temperature sampling, which flattens the token sampling probability. We found that diversification approaches can increase data diversity but often at the cost of data accuracy (i.e., text and labels being appropriate for the target domain). To address this issue, we examined two human interventions, 1) label replacement (LR), correcting misaligned labels, and 2) out-of-scope filtering (OOSF), removing instances that are out of the user’s domain of interest or to which no considered label applies. With oracle studies, we found that LR increases the absolute accuracy of models trained with diversified datasets by 14.4%. Moreover, we found that some models trained with data generated with LR interventions outperformed LLM-based few-shot classification. In contrast, OOSF was not effective in increasing model accuracy, implying the need for future work in human-in-the-loop text data generation.

John Chung

2025

2023

2022

Co-authors

Venues