Konstantin Chernis
2020
SumTitles: a Summarization Dataset with Low Extractiveness
Valentin Malykh
|
Konstantin Chernis
|
Ekaterina Artemova
|
Irina Piontkovskaya
Proceedings of the 28th International Conference on Computational Linguistics
The existing dialogue summarization corpora are significantly extractive. We introduce a methodology for dataset extractiveness evaluation and present a new low-extractive corpus of movie dialogues for abstractive text summarization along with baseline evaluation. The corpus contains 153k dialogues and consists of three parts: 1) automatically aligned subtitles, 2) automatically aligned scenes from scripts, and 3) manually aligned scenes from scripts. We also present an alignment algorithm which we use to construct the corpus.
Search