Summary Level Training of Sentence Rewriting for Abstractive Summarization

Sanghwan Bae, Taeuk Kim, Jihoon Kim, Sang-goo Lee


Abstract
As an attempt to combine extractive and abstractive summarization, Sentence Rewriting models adopt the strategy of extracting salient sentences from a document first and then paraphrasing the selected ones to generate a summary. However, the existing models in this framework mostly rely on sentence-level rewards or suboptimal labels, causing a mismatch between a training objective and evaluation metric. In this paper, we present a novel training signal that directly maximizes summary-level ROUGE scores through reinforcement learning. In addition, we incorporate BERT into our model, making good use of its ability on natural language understanding. In extensive experiments, we show that a combination of our proposed model and training procedure obtains new state-of-the-art performance on both CNN/Daily Mail and New York Times datasets. We also demonstrate that it generalizes better on DUC-2002 test set.
Anthology ID:
D19-5402
Volume:
Proceedings of the 2nd Workshop on New Frontiers in Summarization
Month:
November
Year:
2019
Address:
Hong Kong, China
Editors:
Lu Wang, Jackie Chi Kit Cheung, Giuseppe Carenini, Fei Liu
Venue:
WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10–20
Language:
URL:
https://aclanthology.org/D19-5402
DOI:
10.18653/v1/D19-5402
Bibkey:
Cite (ACL):
Sanghwan Bae, Taeuk Kim, Jihoon Kim, and Sang-goo Lee. 2019. Summary Level Training of Sentence Rewriting for Abstractive Summarization. In Proceedings of the 2nd Workshop on New Frontiers in Summarization, pages 10–20, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):
Summary Level Training of Sentence Rewriting for Abstractive Summarization (Bae et al., 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-dup-bibkey/D19-5402.pdf
Attachment:
 D19-5402.Attachment.zip
Data
CNN/Daily MailNew York Times Annotated Corpus