GUIR @ LongSumm 2020: Learning to Generate Long Summaries from Scientific Documents

Sajad Sotudeh Gharebagh, Arman Cohan, Nazli Goharian


Abstract
This paper presents our methods for the LongSumm 2020: Shared Task on Generating Long Summaries for Scientific Documents, where the task is to generatelong summaries given a set of scientific papers provided by the organizers. We explore 3 main approaches for this task: 1. An extractive approach using a BERT-based summarization model; 2. A two stage model that additionally includes an abstraction step using BART; and 3. A new multi-tasking approach on incorporating document structure into the summarizer. We found that our new multi-tasking approach outperforms the two other methods by large margins. Among 9 participants in the shared task, our best model ranks top according to Rouge-1 score (53.11%) while staying competitive in terms of Rouge-2.
Anthology ID:
2020.sdp-1.41
Volume:
Proceedings of the First Workshop on Scholarly Document Processing
Month:
November
Year:
2020
Address:
Online
Venue:
sdp
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
356–361
Language:
URL:
https://aclanthology.org/2020.sdp-1.41
DOI:
10.18653/v1/2020.sdp-1.41
Bibkey:
Cite (ACL):
Sajad Sotudeh Gharebagh, Arman Cohan, and Nazli Goharian. 2020. GUIR @ LongSumm 2020: Learning to Generate Long Summaries from Scientific Documents. In Proceedings of the First Workshop on Scholarly Document Processing, pages 356–361, Online. Association for Computational Linguistics.
Cite (Informal):
GUIR @ LongSumm 2020: Learning to Generate Long Summaries from Scientific Documents (Sotudeh Gharebagh et al., sdp 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2020.sdp-1.41.pdf
Video:
 https://slideslive.com/38940737