Literary Evidence Retrieval via Long-Context Language Models

Katherine Thai; Mohit Iyyer

Literary Evidence Retrieval via Long-Context Language Models

Abstract

How well do modern long-context language models understand literary fiction? We explore this question via the task of literary evidence retrieval, repurposing the RELiC dataset of Thai et al. (2022) to construct a benchmark where the entire text of a primary source (e.g., The Great Gatsby) is provided to an LLM alongside literary criticism with a missing quotation from that work. This setting, in which the model must generate the missing quotation, mirrors the human process of literary analysis by requiring models to perform both global narrative reasoning and close textual examination. We curate a high-quality subset of 292 examples through extensive filtering and human verification. Our experiments show that recent reasoning models, such as Gemini 2.5 Pro can exceed human expert performance (62.5% vs. 50% accuracy). In contrast, the best open-weight model achieves only 29.1% accuracy, highlighting a wide gap in interpretive reasoning between open and closed-weight models. Despite their speed and apparent accuracy, even the strongest models struggle with nuanced literary signals and overgeneration, signaling open challenges for applying LLMs to literary analysis. We release our dataset and evaluation code to encourage future work in this direction.

Anthology ID:: 2025.acl-short.29
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 369–380
Language:
URL:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-short.29/
DOI:
Bibkey:
Cite (ACL):: Katherine Thai and Mohit Iyyer. 2025. Literary Evidence Retrieval via Long-Context Language Models. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 369–380, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Literary Evidence Retrieval via Long-Context Language Models (Thai & Iyyer, ACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-short.29.pdf

PDF Cite Search Fix data