Konstantin Chernis
Fixing paper assignments
- Please select all papers that belong to the same person.
- Indicate below which author they should be assigned to.
TODO: "submit" and "cancel" buttons here
2020
SumTitles: a Summarization Dataset with Low Extractiveness
Valentin Malykh
|
Konstantin Chernis
|
Ekaterina Artemova
|
Irina Piontkovskaya
Proceedings of the 28th International Conference on Computational Linguistics
The existing dialogue summarization corpora are significantly extractive. We introduce a methodology for dataset extractiveness evaluation and present a new low-extractive corpus of movie dialogues for abstractive text summarization along with baseline evaluation. The corpus contains 153k dialogues and consists of three parts: 1) automatically aligned subtitles, 2) automatically aligned scenes from scripts, and 3) manually aligned scenes from scripts. We also present an alignment algorithm which we use to construct the corpus.