Abstract
Automatic summarization is typically treated as a 1-to-1 mapping from document to summary. Documents such as news articles, however, are structured and often cover multiple topics or aspects; and readers may be interested in only some of them. We tackle the task of aspect-based summarization, where, given a document and a target aspect, our models generate a summary centered around the aspect. We induce latent document structure jointly with an abstractive summarization objective, and train our models in a scalable synthetic setup. In addition to improvements in summarization over topic-agnostic baselines, we demonstrate the benefit of the learnt document structure: we show that our models (a) learn to accurately segment documents by aspect; (b) can leverage the structure to produce both abstractive and extractive aspect-based summaries; and (c) that structure is particularly advantageous for summarizing long documents. All results transfer from synthetic training documents to natural news articles from CNN/Daily Mail and RCV1.- Anthology ID:
- P19-1630
- Volume:
- Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
- Month:
- July
- Year:
- 2019
- Address:
- Florence, Italy
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 6263–6273
- Language:
- URL:
- https://aclanthology.org/P19-1630
- DOI:
- 10.18653/v1/P19-1630
- Cite (ACL):
- Lea Frermann and Alexandre Klementiev. 2019. Inducing Document Structure for Aspect-based Summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 6263–6273, Florence, Italy. Association for Computational Linguistics.
- Cite (Informal):
- Inducing Document Structure for Aspect-based Summarization (Frermann & Klementiev, ACL 2019)
- PDF:
- https://preview.aclanthology.org/starsem-semeval-split/P19-1630.pdf
- Code
- ColiLea/aspect_based_summarization
- Data
- CNN/Daily Mail, RCV1