GenCompareSum: a hybrid unsupervised summarization method using salience

Jennifer Bishop, Qianqian Xie, Sophia Ananiadou


Abstract
Text summarization (TS) is an important NLP task. Pre-trained Language Models (PLMs) have been used to improve the performance of TS. However, PLMs are limited by their need of labelled training data and by their attention mechanism, which often makes them unsuitable for use on long documents. To this end, we propose a hybrid, unsupervised, abstractive-extractive approach, in which we walk through a document, generating salient textual fragments representing its key points. We then select the most important sentences of the document by choosing the most similar sentences to the generated texts, calculated using BERTScore. We evaluate the efficacy of generating and using salient textual fragments to guide extractive summarization on documents from the biomedical and general scientific domains. We compare the performance between long and short documents using different generative text models, which are finetuned to generate relevant queries or document titles. We show that our hybrid approach out-performs existing unsupervised methods, as well as state-of-the-art supervised methods, despite not needing a vast amount of labelled training data.
Anthology ID:
2022.bionlp-1.22
Volume:
Proceedings of the 21st Workshop on Biomedical Language Processing
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, Junichi Tsujii
Venue:
BioNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
220–240
Language:
URL:
https://aclanthology.org/2022.bionlp-1.22
DOI:
10.18653/v1/2022.bionlp-1.22
Bibkey:
Cite (ACL):
Jennifer Bishop, Qianqian Xie, and Sophia Ananiadou. 2022. GenCompareSum: a hybrid unsupervised summarization method using salience. In Proceedings of the 21st Workshop on Biomedical Language Processing, pages 220–240, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
GenCompareSum: a hybrid unsupervised summarization method using salience (Bishop et al., BioNLP 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2022.bionlp-1.22.pdf
Video:
 https://preview.aclanthology.org/emnlp-22-attachments/2022.bionlp-1.22.mp4
Code
 jbshp/gencomparesum
Data
Arxiv HEP-TH citation graphCORD-19PubmedS2ORC