Benchmarking LLMs on Semantic Overlap Summarization

John Salvador, Naman Bansal, Mousumi Akter, Souvika Sarkar, Anupam Das, Santu Karmaker


Abstract
Semantic Overlap Summarization (SOS) is a multi-document summarization task focused on extracting the common information shared cross alternative narratives which is a capability that is critical for trustworthy generation in domains such as news, law, and healthcare. We benchmark popular Large Language Models (LLMs) on SOS and introduce PrivacyPolicyPairs (3P), a new dataset of 135 high-quality samples from privacy policy documents, which complements existing resources and broadens domain coverage. Using the TELeR prompting taxonomy, we evaluate nearly one million LLM-generated summaries across two SOS datasets and conduct human evaluation on a curated subset. Our analysis reveals strong prompt sensitivity, identifies which automatic metrics align most closely with human judgments, and provides new baselines for future SOS research
Anthology ID:
2025.emnlp-main.1692
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
33340–33361
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1692/
DOI:
Bibkey:
Cite (ACL):
John Salvador, Naman Bansal, Mousumi Akter, Souvika Sarkar, Anupam Das, and Santu Karmaker. 2025. Benchmarking LLMs on Semantic Overlap Summarization. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 33340–33361, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Benchmarking LLMs on Semantic Overlap Summarization (Salvador et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1692.pdf
Checklist:
 2025.emnlp-main.1692.checklist.pdf