SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization

Bogdan Gliwa; Iwona Mochol; Maciej Biesek; Aleksander Wawer

doi:10.18653/v1/D19-5409

SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization

Bogdan Gliwa, Iwona Mochol, Maciej Biesek, Aleksander Wawer

Abstract

This paper introduces the SAMSum Corpus, a new dataset with abstractive dialogue summaries. We investigate the challenges it poses for automated summarization by testing several models and comparing their results with those obtained on a corpus of news articles. We show that model-generated summaries of dialogues achieve higher ROUGE scores than the model-generated summaries of news – in contrast with human evaluators’ judgement. This suggests that a challenging task of abstractive dialogue summarization requires dedicated models and non-standard quality measures. To our knowledge, our study is the first attempt to introduce a high-quality chat-dialogues corpus, manually annotated with abstractive summarizations, which can be used by the research community for further studies.

Anthology ID:: D19-5409
Volume:: Proceedings of the 2nd Workshop on New Frontiers in Summarization
Month:: November
Year:: 2019
Address:: Hong Kong, China
Editors:: Lu Wang, Jackie Chi Kit Cheung, Giuseppe Carenini, Fei Liu
Venue:: WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 70–79
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/D19-5409/
DOI:: 10.18653/v1/D19-5409
Bibkey:
Cite (ACL):: Bogdan Gliwa, Iwona Mochol, Maciej Biesek, and Aleksander Wawer. 2019. SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization. In Proceedings of the 2nd Workshop on New Frontiers in Summarization, pages 70–79, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):: SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization (Gliwa et al., 2019)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/D19-5409.pdf
Code: additional community code
Data: SAMSum

PDF Cite Search Code Fix data