Qiqi Wang

2026

Large language models (LLMs) are playing an increasingly pivotal role in LegalAI. However, existing benchmarks are primarily tailored for legal professionals, emphasizing deep reasoning and explainability. While public-facing legal applications demand outputs that are direct, actionable, and accessible, a need largely overlooked by current evaluation frameworks. To bridge this gap, we propose a public-oriented LegalAI benchmark grounded in legal functionalism and genre analysis. Specifically, we categorize public legal demands into two core tasks: Instant Question Answering and Legal Text Generation. We further introduce three public-oriented evaluation dimensions: legal normativity, content relevance, and format usability, which collectively assess the practical validity and user readiness of model outputs. To reflect real-world lay user usage, we evaluate 17 LLMs on Pub-LawBench using only simple prompts and Chain-of-Thought under a vanilla inference setting, excluding complex techniques like RAG or agent-based methods inaccessible to non-experts. Experiments reveal limitations of current LLMs in delivering effective public-oriented legal assistance, highlighting the need for more user-centric model development and benchmarking. Our code and datasets are available for review at https://anonymous.4open.science/r/P-LawBench-E565/.

pdf bib abs

A defence opinion is an essential step in criminal proceedings, yet it has not been systematically formulated or evaluated as a specific LegalAI task. Grounded in legal principles and practice, we formulate this task as generating a structured defence opinion conditioned jointly on an indictment and the defendant’s stated opinion, which often present conflicting claims. We formalize this setting as a dual-perspective generation problem and introduce DefGen-Bench, a benchmark comprising several Chinese criminal cases with expert-reviewed reference defence opinions. We evaluate eight large language models (LLMs) on this task and observe that existing models tend to mirror the defendant’s opinion, thereby overlooking more appropriate defence strategies. To address this challenge, we propose Knowledge-Enhanced Highlighted Indictment (KHI), a legal knowledge–guided input enhancement method applicable to both open- and closed-source LLMs. Experiments demonstrate consistent improvements across all evaluated LLMs, validating the effectiveness of the proposed approach.

pdf bib abs

Legal case facts are often lengthy, complex, and difficult to process, posing challenges for legal judgment prediction. Although recent advances leverage large language models (LLMs) for legal reasoning, they face high computational costs and information degradation when handling long cases. Previous approaches, such as architectural modifications and text compression methods, reduce computational complexity to some extent but still struggle to effectively capture legally salient information in complex cases. We propose a legal knowledge–adaptive compression framework for long legal judgment prediction that integrates domain-specific legal knowledge to guide adaptive context compression. Our approach selectively retains legally relevant information while reducing redundant or less informative content, enabling efficient and accurate long-context reasoning. We evaluate the proposed framework on four real-world datasets spanning multiple jurisdictions and languages. Experimental results demonstrate that our method outperforms existing approaches in both prediction performance and computational efficiency.

pdf bib abs

LegalChainReasoner: Grounding Criminal Judicial Opinion Generation via Structured Legal Chains
Weizhe Shi | Qiqi Wang | Yihong Pan | Qian Liu | Kaiqi Zhao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

A criminal judicial opinion represents the judge’s disposition of a case, including the decision rationale and sentencing. Automatically generating such opinions can assist in analyzing sentencing consistency and provide judges with references to past similar cases. However, current research typically approaches this task by dividing it into two isolated subtasks: legal reasoning and sentencing prediction. This separation often leads to inconsistency between the reasoning and predictions, failing to meet real-world judicial requirements. Furthermore, prior studies rely on manually creating knowledge to enhance applicability, yet such methods remain limited in practical deployment. To address these limitations and better align with legal practice, we propose a new LegalAI task: Criminal Judicial Opinion Generation, which simultaneously produces both legal reasoning and sentencing decisions. To achieve this, we introduce LegalChainReasoner framework that applies structured legal chains to guide the model through comprehensive case assessments. By integrating factual premises, composite legal conditions, and sentencing conclusions, our approach ensures flexible knowledge injection and end-to-end opinion generation. Experiments on real-world, open-source Chinese legal case datasets demonstrate that our method outperforms baseline models.

2024

pdf bib abs

A summary structure is inherent to certain types of texts according to the Genre Theory of Linguistics. Such structures aid readers in efficiently locating information within summaries. However, most existing automatic summarization methods overlook the importance of summary structure, resulting in summaries that emphasize the most prominent information while omitting essential details from other sections. While a few summarizers recognize the importance of summary structure, they rely heavily on the predefined labels of summary structures in the source document and ground truth summaries. To address these shortcomings, we developed a Structured Knowledge-Guided Summarization (SKGSum) and its variant, SKGSum-W, which do not require structure labels. Instead, these methods rely on a set of automatically extracted summary points to generate summaries. We evaluate the proposed methods using three real-world datasets. The results indicate that our methods not only improve the quality of summaries, in terms of ROUGE and BERTScore, but also broaden the types of documents that can be effectively summarized.

2022

pdf bib abs

D2GCLF: Document-to-Graph Classifier for Legal Document Classification
Qiqi Wang | Kaiqi Zhao | Robert Amor | Benjamin Liu | Ruofan Wang
Findings of the Association for Computational Linguistics: NAACL 2022

Legal document classification is an essential task in law intelligence to automate the labor-intensive law case filing process. Unlike traditional document classification problems, legal documents should be classified by reasons and facts instead of topics. We propose a Document-to-Graph Classifier (D2GCLF), which extracts facts as relations between key participants in the law case and represents a legal document with four relation graphs. Each graph is responsible for capturing different relations between the litigation participants. We further develop a graph attention network on top of the four relation graphs to classify the legal documents. Experiments on a real-world legal document dataset show that D2GCLF outperforms the state-of-the-art methods in terms of accuracy.

Co-authors

Venues

ACL3
Findings3

Fix author