Abstract
We carry out experiments with deep learning models of summarization across the domains of news, personal stories, meetings, and medical articles in order to understand how content selection is performed. We find that many sophisticated features of state of the art extractive summarizers do not improve performance over simpler models. These results suggest that it is easier to create a summarizer for a new domain than previous work suggests and bring into question the benefit of deep learning models for summarization for those domains that do have massive datasets (i.e., news). At the same time, they suggest important questions for new research in summarization; namely, new forms of sentence representations or external knowledge sources are needed that are better suited to the sumarization task.- Anthology ID:
- D18-1208
- Original:
- D18-1208v1
- Version 2:
- D18-1208v2
- Volume:
- Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
- Month:
- October-November
- Year:
- 2018
- Address:
- Brussels, Belgium
- Editors:
- Ellen Riloff, David Chiang, Julia Hockenmaier, Jun’ichi Tsujii
- Venue:
- EMNLP
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1818–1828
- Language:
- URL:
- https://aclanthology.org/D18-1208
- DOI:
- 10.18653/v1/D18-1208
- Cite (ACL):
- Chris Kedzie, Kathleen McKeown, and Hal Daumé III. 2018. Content Selection in Deep Learning Models of Summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1818–1828, Brussels, Belgium. Association for Computational Linguistics.
- Cite (Informal):
- Content Selection in Deep Learning Models of Summarization (Kedzie et al., EMNLP 2018)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/D18-1208.pdf
- Code
- kedz/nnsum + additional community code
- Data
- New York Times Annotated Corpus