Bipartite Graph Pre-training for Unsupervised Extractive Summarization with Graph Convolutional Auto-Encoders
Qianren Mao, Shaobo Zhao, Jiarui Li, Xiaolei Gu, Shizhu He, Bo Li, Jianxin Li
Abstract
Pre-trained sentence representations are crucial for identifying significant sentences in unsupervised document extractive summarization. However, the traditional two-step paradigm of pre-training and sentence-ranking, creates a gap due to differing optimization objectives. To address this issue, we argue that utilizing pre-trained embeddings derived from a process specifically designed to optimize informative and distinctive sentence representations helps rank significant sentences. To do so, we propose a novel graph pre-training auto-encoder to obtain sentence embeddings by explicitly modelling intra-sentential distinctive features and inter-sentential cohesive features through sentence-word bipartite graphs. These fine-tuned sentence embeddings are then utilized in a graph-based ranking algorithm for unsupervised summarization. Our method is a plug-and-play pre-trained model that produces predominant performance for unsupervised summarization frameworks by providing summary-worthy sentence representations. It surpasses heavy BERT- or RoBERTa-based sentence representations in downstream tasks.- Anthology ID:
- 2023.findings-emnlp.328
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2023
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Houda Bouamor, Juan Pino, Kalika Bali
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 4929–4941
- Language:
- URL:
- https://preview.aclanthology.org/ingest_wac_2008/2023.findings-emnlp.328/
- DOI:
- 10.18653/v1/2023.findings-emnlp.328
- Cite (ACL):
- Qianren Mao, Shaobo Zhao, Jiarui Li, Xiaolei Gu, Shizhu He, Bo Li, and Jianxin Li. 2023. Bipartite Graph Pre-training for Unsupervised Extractive Summarization with Graph Convolutional Auto-Encoders. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 4929–4941, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Bipartite Graph Pre-training for Unsupervised Extractive Summarization with Graph Convolutional Auto-Encoders (Mao et al., Findings 2023)
- PDF:
- https://preview.aclanthology.org/ingest_wac_2008/2023.findings-emnlp.328.pdf