DiscoRAG: A Discourse-Aware Agent for Query-Based Summarization of Long Documents

Alexander Chernyavskiy, Lidiia Ostyakova, Dmitry Ilvovsky


Abstract
Query-based summarization of long documents is often tackled with retrieval-augmented generation (RAG). However, conventional RAG models exhibit limitations when applied to narrative texts, where crucial evidence is often implicit and distributed. This exposes a distinct class of “discourse-aware” queries that require specialized, structure-aware models. To address this, we introduce DiscoRAG, a framework that leverages Rhetorical Structure Theory (RST). By modeling the document as a discourse tree, DiscoRAG navigates its structure, explicitly using rhetorical relations to focus on and aggregate evidence from globally related segments. Furthermore, our pipeline integrates a classifier that assesses query complexity to dynamically select the most efficient retrieval strategy. We evaluate our DiscoRAG against standard and extended-context RAG pipelines on the SQuALITY dataset, which we release augmented with questions requiring deep discourse reasoning and integration of the global narrative. Our results demonstrate that this method sizeably outperforms these baselines, demonstrating its superior ability to assemble a coherent, contextually rich evidence base by interpreting the global narrative structure rather than relying on local semantic similarity.
Anthology ID:
2026.lrec-main.162
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
2062–2075
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.162/
DOI:
Bibkey:
Cite (ACL):
Alexander Chernyavskiy, Lidiia Ostyakova, and Dmitry Ilvovsky. 2026. DiscoRAG: A Discourse-Aware Agent for Query-Based Summarization of Long Documents. International Conference on Language Resources and Evaluation, main:2062–2075.
Cite (Informal):
DiscoRAG: A Discourse-Aware Agent for Query-Based Summarization of Long Documents (Chernyavskiy et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.162.pdf