An Analysis of Scoring Methods for Reranking in Large Language Model Story Generation

Megan Deering, Gerald Penn


Abstract
Outline-conditioned story generation using Large Language Models (LLMs) offers a promising approach for automating narrative creation. Some outline-conditioned story generation methods use automatic scoring during the generation process in order to improve the story quality. However, current research has shown that automatic scoring is not ideal for assessing story quality. This paper evaluates three proposed automatic story-scoring methods to improve the reranking of outputs during the generation process. These scoring methods leverage different prompting strategies and fine-tuning techniques to enhance the accuracy and relevance of the assessments. By experimenting with these approaches within a beam search framework, we aim to identify the most effective methods for optimizing story-generation outcomes. While we have found no significant overall difference between these methods in terms of their agreement with human ratings during story generation, the overall story ratings by human evaluators are average. These findings motivate the need for improved automatic scoring techniques and datasets while also indicating that simpler, more easily implementable scoring methods for reranking perform comparably to more complex approaches.
Anthology ID:
2025.in2writing-1.10
Volume:
Proceedings of the Fourth Workshop on Intelligent and Interactive Writing Assistants (In2Writing 2025)
Month:
May
Year:
2025
Address:
Albuquerque, New Mexico, US
Editors:
Vishakh Padmakumar, Katy Gero, Thiemo Wambsganss, Sarah Sterman, Ting-Hao Huang, David Zhou, John Chung
Venues:
In2Writing | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
109–120
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.in2writing-1.10/
DOI:
Bibkey:
Cite (ACL):
Megan Deering and Gerald Penn. 2025. An Analysis of Scoring Methods for Reranking in Large Language Model Story Generation. In Proceedings of the Fourth Workshop on Intelligent and Interactive Writing Assistants (In2Writing 2025), pages 109–120, Albuquerque, New Mexico, US. Association for Computational Linguistics.
Cite (Informal):
An Analysis of Scoring Methods for Reranking in Large Language Model Story Generation (Deering & Penn, In2Writing 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.in2writing-1.10.pdf