Abstract
This paper presents our methods for the LongSumm 2020: Shared Task on Generating Long Summaries for Scientific Documents, where the task is to generatelong summaries given a set of scientific papers provided by the organizers. We explore 3 main approaches for this task: 1. An extractive approach using a BERT-based summarization model; 2. A two stage model that additionally includes an abstraction step using BART; and 3. A new multi-tasking approach on incorporating document structure into the summarizer. We found that our new multi-tasking approach outperforms the two other methods by large margins. Among 9 participants in the shared task, our best model ranks top according to Rouge-1 score (53.11%) while staying competitive in terms of Rouge-2.- Anthology ID:
- 2020.sdp-1.41
- Volume:
- Proceedings of the First Workshop on Scholarly Document Processing
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Venue:
- sdp
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 356–361
- Language:
- URL:
- https://aclanthology.org/2020.sdp-1.41
- DOI:
- 10.18653/v1/2020.sdp-1.41
- Cite (ACL):
- Sajad Sotudeh Gharebagh, Arman Cohan, and Nazli Goharian. 2020. GUIR @ LongSumm 2020: Learning to Generate Long Summaries from Scientific Documents. In Proceedings of the First Workshop on Scholarly Document Processing, pages 356–361, Online. Association for Computational Linguistics.
- Cite (Informal):
- GUIR @ LongSumm 2020: Learning to Generate Long Summaries from Scientific Documents (Sotudeh Gharebagh et al., sdp 2020)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2020.sdp-1.41.pdf