Self-Supervised Learning for Contextualized Extractive Summarization
Hong Wang, Xin Wang, Wenhan Xiong, Mo Yu, Xiaoxiao Guo, Shiyu Chang, William Yang Wang
Abstract
Existing models for extractive summarization are usually trained from scratch with a cross-entropy loss, which does not explicitly capture the global context at the document level. In this paper, we aim to improve this task by introducing three auxiliary pre-training tasks that learn to capture the document-level context in a self-supervised fashion. Experiments on the widely-used CNN/DM dataset validate the effectiveness of the proposed auxiliary tasks. Furthermore, we show that after pre-training, a clean model with simple building blocks is able to outperform previous state-of-the-art that are carefully designed.- Anthology ID:
- P19-1214
- Volume:
- Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
- Month:
- July
- Year:
- 2019
- Address:
- Florence, Italy
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2221–2227
- Language:
- URL:
- https://aclanthology.org/P19-1214
- DOI:
- 10.18653/v1/P19-1214
- Cite (ACL):
- Hong Wang, Xin Wang, Wenhan Xiong, Mo Yu, Xiaoxiao Guo, Shiyu Chang, and William Yang Wang. 2019. Self-Supervised Learning for Contextualized Extractive Summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2221–2227, Florence, Italy. Association for Computational Linguistics.
- Cite (Informal):
- Self-Supervised Learning for Contextualized Extractive Summarization (Wang et al., ACL 2019)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/P19-1214.pdf
- Code
- hongwang600/Summarization + additional community code