Abstractive Document Summarization without Parallel Data

Nikola I. Nikolov, Richard Hahnloser


Abstract
Abstractive summarization typically relies on large collections of paired articles and summaries. However, in many cases, parallel data is scarce and costly to obtain. We develop an abstractive summarization system that relies only on large collections of example summaries and non-matching articles. Our approach consists of an unsupervised sentence extractor that selects salient sentences to include in the final summary, as well as a sentence abstractor that is trained on pseudo-parallel and synthetic data, that paraphrases each of the extracted sentences. We perform an extensive evaluation of our method: on the CNN/DailyMail benchmark, on which we compare our approach to fully supervised baselines, as well as on the novel task of automatically generating a press release from a scientific journal article, which is well suited for our system. We show promising performance on both tasks, without relying on any article-summary pairs.
Anthology ID:
2020.lrec-1.819
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
6638–6644
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.819
DOI:
Bibkey:
Cite (ACL):
Nikola I. Nikolov and Richard Hahnloser. 2020. Abstractive Document Summarization without Parallel Data. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 6638–6644, Marseille, France. European Language Resources Association.
Cite (Informal):
Abstractive Document Summarization without Parallel Data (Nikolov & Hahnloser, LREC 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/2020.lrec-1.819.pdf
Code
 ninikolov/low_resource_summarization
Data
CNN/Daily Mail