Revisiting the Centroid-based Method: A Strong Baseline for Multi-Document Summarization

Demian Gholipour Ghalandari

doi:10.18653/v1/W17-4511

Revisiting the Centroid-based Method: A Strong Baseline for Multi-Document Summarization

Abstract

The centroid-based model for extractive document summarization is a simple and fast baseline that ranks sentences based on their similarity to a centroid vector. In this paper, we apply this ranking to possible summaries instead of sentences and use a simple greedy algorithm to find the best summary. Furthermore, we show possibilities to scale up to larger input document collections by selecting a small number of sentences from each document prior to constructing the summary. Experiments were done on the DUC2004 dataset for multi-document summarization. We observe a higher performance over the original model, on par with more complex state-of-the-art methods.

Anthology ID:: W17-4511
Volume:: Proceedings of the Workshop on New Frontiers in Summarization
Month:: September
Year:: 2017
Address:: Copenhagen, Denmark
Venue:: WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 85–90
Language:
URL:: https://aclanthology.org/W17-4511
DOI:: 10.18653/v1/W17-4511
Bibkey:
Cite (ACL):: Demian Gholipour Ghalandari. 2017. Revisiting the Centroid-based Method: A Strong Baseline for Multi-Document Summarization. In Proceedings of the Workshop on New Frontiers in Summarization, pages 85–90, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):: Revisiting the Centroid-based Method: A Strong Baseline for Multi-Document Summarization (Gholipour Ghalandari, 2017)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-script-update/W17-4511.pdf

PDF Search