Shirui Pan


2021

pdf bib
Medical Code Assignment with Gated Convolution and Note-Code Interaction
Shaoxiong Ji | Shirui Pan | Pekka Marttinen
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Leveraging Information Bottleneck for Scientific Document Summarization
Jiaxin Ju | Ming Liu | Huan Yee Koh | Yuan Jin | Lan Du | Shirui Pan
Findings of the Association for Computational Linguistics: EMNLP 2021

This paper presents an unsupervised extractive approach to summarize scientific long documents based on the Information Bottleneck principle. Inspired by previous work which uses the Information Bottleneck principle for sentence compression, we extend it to document level summarization with two separate steps. In the first step, we use signal(s) as queries to retrieve the key content from the source document. Then, a pre-trained language model conducts further sentence search and edit to return the final extracted summaries. Importantly, our work can be flexibly extended to a multi-view framework by different signals. Automatic evaluation on three scientific document datasets verifies the effectiveness of the proposed framework. The further human evaluation suggests that the extracted summaries cover more content aspects than previous systems.

2020

pdf bib
Monash-Summ@LongSumm 20 SciSummPip: An Unsupervised Scientific Paper Summarization Pipeline
Jiaxin Ju | Ming Liu | Longxiang Gao | Shirui Pan
Proceedings of the First Workshop on Scholarly Document Processing

The Scholarly Document Processing (SDP) workshop is to encourage more efforts on natural language understanding of scientific task. It contains three shared tasks and we participate in the LongSumm shared task. In this paper, we describe our text summarization system, SciSummPip, inspired by SummPip (Zhao et al., 2020) that is an unsupervised text summarization system for multi-document in News domain. Our SciSummPip includes a transformer-based language model SciBERT (Beltagy et al., 2019) for contextual sentence representation, content selection with PageRank (Page et al., 1999), sentence graph construction with both deep and linguistic information, sentence graph clustering and within-graph summary generation. Our work differs from previous method in that content selection and a summary length constraint is applied to adapt to the scientific domain. The experiment results on both training dataset and blind test dataset show the effectiveness of our method, and we empirically verify the robustness of modules used in SciSummPip with BERTScore (Zhang et al., 2019a).