Event-based evaluation of abstractive news summarization

Huiling You, Samia Touileb, Lilja Øvrelid, Erik Velldal


Abstract
An abstractive summary of a news article contains its most important information in a condensed version. The evaluation of automatically generated summaries by generative language models relies heavily on human-authored summaries as gold references, by calculating overlapping units or similarity scores. News articles report events, and ideally so should the summaries. In this work, we propose to evaluate the quality of abstractive summaries by calculating overlapping events between generated summaries, reference summaries, and the original news articles. We experiment on a richly annotated Norwegian dataset comprising both events annotations and summaries authored by expert human annotators. Our approach provides more insight into the event information contained in the summaries.
Anthology ID:
2025.gem-1.43
Volume:
Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²)
Month:
July
Year:
2025
Address:
Vienna, Austria and virtual meeting
Editors:
Kaustubh Dhole, Miruna Clinciu
Venues:
GEM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
504–510
Language:
URL:
https://preview.aclanthology.org/corrections-2025-08/2025.gem-1.43/
DOI:
Bibkey:
Cite (ACL):
Huiling You, Samia Touileb, Lilja Øvrelid, and Erik Velldal. 2025. Event-based evaluation of abstractive news summarization. In Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²), pages 504–510, Vienna, Austria and virtual meeting. Association for Computational Linguistics.
Cite (Informal):
Event-based evaluation of abstractive news summarization (You et al., GEM 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/corrections-2025-08/2025.gem-1.43.pdf