Agent-as-Judge for Factual Summarization of Long Narratives

Yeonseok Jeong; Minsoo Kim; Seung-won Hwang; Byung-Hak Kim

Agent-as-Judge for Factual Summarization of Long Narratives

Yeonseok Jeong, Minsoo Kim, Seung-won Hwang, Byung-Hak Kim

Abstract

Large Language Models (LLMs) have demonstrated near-human performance in summarization tasks based on traditional metrics such as ROUGE and BERTScore. However, these metrics do not adequately capture critical aspects of summarization quality, such as factual accuracy, particularly for long narratives (>100K tokens). Recent advances, such as LLM-as-a-Judge, address the limitations of metrics based on lexical similarity but still exhibit factual inconsistencies, especially in understanding character relationships and states. In this work, we introduce NarrativeFactScore (NFS), the first “Agent-as-a-Judge” framework that evaluates and refines factuality in narrative summarization. By leveraging a Character Knowledge Graph (CKG) extracted from input narrative, NarrativeFactScore evaluates the factuality and provides actionable guidance for refinement, such as identifying missing or erroneous facts. Our experimental results demonstrate that constructing the CKG enables reasoning with 1/3 of the factuality computation used in the prior approach, and achieve three times higher correlation with human judgments. Furthermore, refinement with actionable guidance improves the quality of the summary.

Anthology ID:: 2025.emnlp-main.1204
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 23602–23619
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1204/
DOI:
Bibkey:
Cite (ACL):: Yeonseok Jeong, Minsoo Kim, Seung-won Hwang, and Byung-Hak Kim. 2025. Agent-as-Judge for Factual Summarization of Long Narratives. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 23602–23619, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Agent-as-Judge for Factual Summarization of Long Narratives (Jeong et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1204.pdf
Checklist:: 2025.emnlp-main.1204.checklist.pdf

PDF Cite Search Checklist Fix data