Help Me Write a Story: Evaluating LLMs’ Ability to Generate Writing Feedback

Hannah Rashkin, Elizabeth Clark, Fantine Huot, Mirella Lapata


Abstract
Can LLMs provide support to creative writers by giving meaningful writing feedback? In this paper, we explore the challenges and limitations of model-generated writing feedback by defining a new task, dataset, and evaluation frameworks. To study model performance in a controlled manner, we present a novel test set of 1,300 stories that we corrupted to intentionally introduce writing issues. We study the performance of commonly used LLMs in this task with both automatic and human evaluation metrics. Our analysis shows that current models have strong out-of-the-box behavior in many respects—providing specific and mostly accurate writing feedback. However, models often fail to identify the biggest writing issue in the story and to correctly decide when to offer critical vs. positive feedback.
Anthology ID:
2025.acl-long.1254
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
25827–25847
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1254/
DOI:
Bibkey:
Cite (ACL):
Hannah Rashkin, Elizabeth Clark, Fantine Huot, and Mirella Lapata. 2025. Help Me Write a Story: Evaluating LLMs’ Ability to Generate Writing Feedback. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 25827–25847, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Help Me Write a Story: Evaluating LLMs’ Ability to Generate Writing Feedback (Rashkin et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1254.pdf