Leveraging Information Bottleneck for Scientific Document Summarization
Jiaxin Ju, Ming Liu, Huan Yee Koh, Yuan Jin, Lan Du, Shirui Pan
Abstract
This paper presents an unsupervised extractive approach to summarize scientific long documents based on the Information Bottleneck principle. Inspired by previous work which uses the Information Bottleneck principle for sentence compression, we extend it to document level summarization with two separate steps. In the first step, we use signal(s) as queries to retrieve the key content from the source document. Then, a pre-trained language model conducts further sentence search and edit to return the final extracted summaries. Importantly, our work can be flexibly extended to a multi-view framework by different signals. Automatic evaluation on three scientific document datasets verifies the effectiveness of the proposed framework. The further human evaluation suggests that the extracted summaries cover more content aspects than previous systems.- Anthology ID:
- 2021.findings-emnlp.345
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2021
- Month:
- November
- Year:
- 2021
- Address:
- Punta Cana, Dominican Republic
- Venue:
- Findings
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 4091–4098
- Language:
- URL:
- https://aclanthology.org/2021.findings-emnlp.345
- DOI:
- 10.18653/v1/2021.findings-emnlp.345
- Cite (ACL):
- Jiaxin Ju, Ming Liu, Huan Yee Koh, Yuan Jin, Lan Du, and Shirui Pan. 2021. Leveraging Information Bottleneck for Scientific Document Summarization. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4091–4098, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cite (Informal):
- Leveraging Information Bottleneck for Scientific Document Summarization (Ju et al., Findings 2021)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2021.findings-emnlp.345.pdf