Benjamin Liu

2024

A summary structure is inherent to certain types of texts according to the Genre Theory of Linguistics. Such structures aid readers in efficiently locating information within summaries. However, most existing automatic summarization methods overlook the importance of summary structure, resulting in summaries that emphasize the most prominent information while omitting essential details from other sections. While a few summarizers recognize the importance of summary structure, they rely heavily on the predefined labels of summary structures in the source document and ground truth summaries. To address these shortcomings, we developed a Structured Knowledge-Guided Summarization (SKGSum) and its variant, SKGSum-W, which do not require structure labels. Instead, these methods rely on a set of automatically extracted summary points to generate summaries. We evaluate the proposed methods using three real-world datasets. The results indicate that our methods not only improve the quality of summaries, in terms of ROUGE and BERTScore, but also broaden the types of documents that can be effectively summarized.

2022

pdf abs
D2GCLF: Document-to-Graph Classifier for Legal Document Classification
Qiqi Wang | Kaiqi Zhao | Robert Amor | Benjamin Liu | Ruofan Wang
Findings of the Association for Computational Linguistics: NAACL 2022

Legal document classification is an essential task in law intelligence to automate the labor-intensive law case filing process. Unlike traditional document classification problems, legal documents should be classified by reasons and facts instead of topics. We propose a Document-to-Graph Classifier (D2GCLF), which extracts facts as relations between key participants in the law case and represents a legal document with four relation graphs. Each graph is responsible for capturing different relations between the litigation participants. We further develop a graph attention network on top of the four relation graphs to classify the legal documents. Experiments on a real-world legal document dataset show that D2GCLF outperforms the state-of-the-art methods in terms of accuracy.

Co-authors

Xianda Zheng 1

Zijian Huang 1

Venues

findings2