Discrete Optimization for Unsupervised Sentence Summarization with Word-Level Extraction
Raphael Schumann, Lili Mou, Yao Lu, Olga Vechtomova, Katja Markert
Abstract
Automatic sentence summarization produces a shorter version of a sentence, while preserving its most important information. A good summary is characterized by language fluency and high information overlap with the source sentence. We model these two aspects in an unsupervised objective function, consisting of language modeling and semantic similarity metrics. We search for a high-scoring summary by discrete optimization. Our proposed method achieves a new state-of-the art for unsupervised sentence summarization according to ROUGE scores. Additionally, we demonstrate that the commonly reported ROUGE F1 metric is sensitive to summary length. Since this is unwillingly exploited in recent work, we emphasize that future evaluation should explicitly group summarization systems by output length brackets.- Anthology ID:
- 2020.acl-main.452
- Volume:
- Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
- Month:
- July
- Year:
- 2020
- Address:
- Online
- Editors:
- Dan Jurafsky, Joyce Chai, Natalie Schluter, Joel Tetreault
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 5032–5042
- Language:
- URL:
- https://aclanthology.org/2020.acl-main.452
- DOI:
- 10.18653/v1/2020.acl-main.452
- Cite (ACL):
- Raphael Schumann, Lili Mou, Yao Lu, Olga Vechtomova, and Katja Markert. 2020. Discrete Optimization for Unsupervised Sentence Summarization with Word-Level Extraction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5032–5042, Online. Association for Computational Linguistics.
- Cite (Informal):
- Discrete Optimization for Unsupervised Sentence Summarization with Word-Level Extraction (Schumann et al., ACL 2020)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2020.acl-main.452.pdf
- Code
- raphael-sch/HC_Sentence_Summarization + additional community code
- Data
- SNLI