Evaluating Large Language Models for Narrative Topic Labeling

Andrew Piper, Sophie Wu


Abstract
This paper evaluates the effectiveness of large language models (LLMs) for labeling topics in narrative texts, comparing performance across fiction and news genres. Building on prior studies in factual documents, we extend the evaluation to narrative contexts where story content is central. Using a ranked voting system with 200 crowdworkers, we assess participants’ preferences of topic labels by comparing multiple LLM outputs with human annotations. Our findings indicate minimal inter-model variation, with LLMs performing on par with human readers in news and outperforming humans in fiction. We conclude with a case study using a set of 25,000 narrative passages from novels illustrating the analytical value of LLM topic labels compared to traditional methods. The results highlight the significant promise of LLMs for topic labeling of narrative texts.
Anthology ID:
2025.nlp4dh-1.25
Volume:
Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities
Month:
May
Year:
2025
Address:
Albuquerque, USA
Editors:
Mika Hämäläinen, Emily Öhman, Yuri Bizzoni, So Miyagawa, Khalid Alnajjar
Venues:
NLP4DH | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
281–291
Language:
URL:
https://preview.aclanthology.org/moar-dois/2025.nlp4dh-1.25/
DOI:
10.18653/v1/2025.nlp4dh-1.25
Bibkey:
Cite (ACL):
Andrew Piper and Sophie Wu. 2025. Evaluating Large Language Models for Narrative Topic Labeling. In Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities, pages 281–291, Albuquerque, USA. Association for Computational Linguistics.
Cite (Informal):
Evaluating Large Language Models for Narrative Topic Labeling (Piper & Wu, NLP4DH 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/moar-dois/2025.nlp4dh-1.25.pdf