Two-Stage Graph-Augmented Summarization of Scientific Documents

Rezvaneh Rezapour; Yubin Ge; Kanyao Han; Ray Jeong; Jana Diesner

doi:10.18653/v1/2024.nlp4science-1.5

Two-Stage Graph-Augmented Summarization of Scientific Documents

Rezvaneh Rezapour, Yubin Ge, Kanyao Han, Ray Jeong, Jana Diesner

Abstract

Automatic text summarization helps to digest the vast and ever-growing amount of scientific publications. While transformer-based solutions like BERT and SciBERT have advanced scientific summarization, lengthy documents pose a challenge due to the token limits of these models. To address this issue, we introduce and evaluate a two-stage model that combines an extract-then-compress framework. Our model incorporates a “graph-augmented extraction module” to select order-based salient sentences and an “abstractive compression module” to generate concise summaries. Additionally, we introduce the *BioConSumm* dataset, which focuses on biodiversity conservation, to support underrepresented domains and explore domain-specific summarization strategies. Out of the tested models, our model achieves the highest ROUGE-2 and ROUGE-L scores on our newly created dataset (*BioConSumm*) and on the *SUMPUBMED* dataset, which serves as a benchmark in the field of biomedicine.

Anthology ID:: 2024.nlp4science-1.5
Volume:: Proceedings of the 1st Workshop on NLP for Science (NLP4Science)
Month:: November
Year:: 2024
Address:: Miami, FL, USA
Editors:: Lotem Peled-Cohen, Nitay Calderon, Shir Lissak, Roi Reichart
Venues:: NLP4Science | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 36–46
Language:
URL:: https://preview.aclanthology.org/landing_page/2024.nlp4science-1.5/
DOI:: 10.18653/v1/2024.nlp4science-1.5
Bibkey:
Cite (ACL):: Rezvaneh Rezapour, Yubin Ge, Kanyao Han, Ray Jeong, and Jana Diesner. 2024. Two-Stage Graph-Augmented Summarization of Scientific Documents. In Proceedings of the 1st Workshop on NLP for Science (NLP4Science), pages 36–46, Miami, FL, USA. Association for Computational Linguistics.
Cite (Informal):: Two-Stage Graph-Augmented Summarization of Scientific Documents (Rezapour et al., NLP4Science 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2024.nlp4science-1.5.pdf

PDF Cite Search Fix data