DomainSum: A Hierarchical Benchmark for Fine-Grained Domain Shift in Abstractive Text Summarization

Haohan Yuan; Haopeng Zhang

DomainSum: A Hierarchical Benchmark for Fine-Grained Domain Shift in Abstractive Text Summarization

Abstract

Most research on abstractive summarization focuses on single-domain applications, often neglecting how domain shifts between documents affect performance and the generalization ability of summarization models. To address this issue, we introduce DomainSum, a hierarchical benchmark designed to capture fine-grained domain shifts in abstractive summarization. We categorize these shifts into three levels: genre, style, and topic, and demonstrate through comprehensive benchmark analysis that they follow a hierarchical structure. Furthermore, we evaluate the domain generalization capabilities of commonly used pre-trained language models (PLMs) and large language models (LLMs) in both in-domain and cross-domain settings. Our benchmark and source code are released at https://github.com/hpzhang94/DomainSum.

Anthology ID:: 2025.findings-naacl.118
Volume:: Findings of the Association for Computational Linguistics: NAACL 2025
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2219–2231
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2025.findings-naacl.118/
DOI:
Bibkey:
Cite (ACL):: Haohan Yuan and Haopeng Zhang. 2025. DomainSum: A Hierarchical Benchmark for Fine-Grained Domain Shift in Abstractive Text Summarization. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 2219–2231, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: DomainSum: A Hierarchical Benchmark for Fine-Grained Domain Shift in Abstractive Text Summarization (Yuan & Zhang, Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2025.findings-naacl.118.pdf

PDF Cite Search Fix data