Thi-Minh-Thu Vu
2025
Beyond the Scientific Document: A Citation-Aware Multi-Granular Summarization Approach with Heterogeneous Graphs
Quoc-An Nguyen
|
Xuan-Hung Le
|
Thi-Minh-Thu Vu
|
Hoang-Quynh Le
Findings of the Association for Computational Linguistics: EMNLP 2025
Scientific summarization remains a challenging task due to the complex characteristics of internal structure and its external relations to other documents. To address this, our proposed model constructs a heterogeneous graph to represent a document and its relevant external citations. This heterogeneous graph enables the model to exploit information across multiple granularities, ranging from fine-grained textual components to the global document structure, and from internal content to external citation context, which facilitates context-aware representations and effectively reduces redundancy. In addition, we develop an effective encoder based on a multi-granularity graph attention mechanism and the triplet loss objective to enhance representation learning performance. Experimental results across three different scenarios consistently demonstrate that our model outperforms existing approaches. Source code is available at: https://github.com/quocanuetcs/CiteHeteroSum.