TalkSumm: A Dataset and Scalable Annotation Method for Scientific Paper Summarization Based on Conference Talks

Guy Lev, Michal Shmueli-Scheuer, Jonathan Herzig, Achiya Jerbi, David Konopnicki


Abstract
Currently, no large-scale training data is available for the task of scientific paper summarization. In this paper, we propose a novel method that automatically generates summaries for scientific papers, by utilizing videos of talks at scientific conferences. We hypothesize that such talks constitute a coherent and concise description of the papers’ content, and can form the basis for good summaries. We collected 1716 papers and their corresponding videos, and created a dataset of paper summaries. A model trained on this dataset achieves similar performance as models trained on a dataset of summaries created manually. In addition, we validated the quality of our summaries by human experts.
Anthology ID:
P19-1204
Volume:
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2019
Address:
Florence, Italy
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2125–2131
Language:
URL:
https://aclanthology.org/P19-1204
DOI:
10.18653/v1/P19-1204
Bibkey:
Cite (ACL):
Guy Lev, Michal Shmueli-Scheuer, Jonathan Herzig, Achiya Jerbi, and David Konopnicki. 2019. TalkSumm: A Dataset and Scalable Annotation Method for Scientific Paper Summarization Based on Conference Talks. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2125–2131, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
TalkSumm: A Dataset and Scalable Annotation Method for Scientific Paper Summarization Based on Conference Talks (Lev et al., ACL 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/P19-1204.pdf
Code
 levguy/talksumm
Data
TalkSumm