Discourse-Driven Evaluation: Unveiling Factual Inconsistency in Long Document Summarization

Yang Zhong; Diane Litman

Discourse-Driven Evaluation: Unveiling Factual Inconsistency in Long Document Summarization

Abstract

Detecting factual inconsistency for long document summarization remains challenging, given the complex structure of the source article and long summary length. In this work, we study factual inconsistency errors and connect them with a line of discourse analysis. We find that errors are more common in complex sentences and are associated with several discourse features. We propose a framework that decomposes long texts into discourse-inspired chunks and utilizes discourse information to better aggregate sentence-level scores predicted by NLI models. Our approach shows improved performance on top of different model baselines over several evaluation benchmarks, covering rich domains of texts, focusing on long document summarization. This underscores the significance of incorporating discourse features in developing models for scoring summaries for long document factual inconsistency.

Anthology ID:: 2025.naacl-long.103
Volume:: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2050–2073
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2025.naacl-long.103/
DOI:
Bibkey:
Cite (ACL):: Yang Zhong and Diane Litman. 2025. Discourse-Driven Evaluation: Unveiling Factual Inconsistency in Long Document Summarization. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 2050–2073, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: Discourse-Driven Evaluation: Unveiling Factual Inconsistency in Long Document Summarization (Zhong & Litman, NAACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2025.naacl-long.103.pdf

PDF Cite Search Fix data