Kaiqi Zhao


2026

Explicit knowledge conflicts, where retrieved contexts contain contradictory information, have become increasingly prevalent as Large Language Models (LLMs) integrate diverse data sources. The core challenge lies in the complexity of entangled narratives and the heterogeneity of conflict cases, which impose excessive demands on the reasoning capabilities of standard models. To address this, we propose Knowledge Conflict Reasoning (KCR), a framework that adjudicates conflicts by structuring the underlying logic. KCR first disentangles conflicting contexts into distinct sets of reasoning traces, utilizing both textual and graph-based representations, to simplify comprehension. It then employs a Reinforcement Learning with Verifiable Rewards (RLVR) paradigm, guiding the model to internalize a reasoning process that maximizes logical consistency while actively suppressing spurious reasoning paths derived from contradictory contexts. Extensive experiments demonstrate that KCR yields substantial improvements: a KCR-enhanced 7B model surpasses the performance of baselines equipped with top-tier closed-source models such as GPT-4o and GPT-5.1.
Despite recent progress, the reasoning capabilities of large multimodal language models (MLLMs) remain fundamentally constrained by static supervision, where fixed prompts, rules, or reward models provide non-adaptive guidance throughout training. Such static signals are often sufficient to enforce output formats, but fail to shape the underlying reasoning process, leading to brittle generalization and performance saturation in complex decision-making tasks. We propose Evo-PI, a principle-centric learning framework that treats reasoning principles as explicit, language-based supervision signals that can be generated, evaluated, and iteratively evolved. Instead of relying on fixed rewards, Evo-PI enables a co-evolutionary loop in which principles guide model reasoning, while model behaviors in turn refine the principles that supervise them. This dynamic alignment mechanism allows supervision to progressively adapt to the model’s reasoning deficiencies. We instantiate Evo-PI in medical visual question answering as a high-stakes testbed requiring structured visual–textual reasoning. Across eight benchmarks and multiple model backbones, Evo-PI consistently improves reasoning accuracy, achieving gains of up to 24.6%. Our results suggest that evolving principle-guided supervision offers a scalable and general paradigm for training expert-aligned reasoning in multimodal language models.
Legal case facts are often lengthy, complex, and difficult to process, posing challenges for legal judgment prediction. Although recent advances leverage large language models (LLMs) for legal reasoning, they face high computational costs and information degradation when handling long cases. Previous approaches, such as architectural modifications and text compression methods, reduce computational complexity to some extent but still struggle to effectively capture legally salient information in complex cases. We propose a legal knowledge–adaptive compression framework for long legal judgment prediction that integrates domain-specific legal knowledge to guide adaptive context compression. Our approach selectively retains legally relevant information while reducing redundant or less informative content, enabling efficient and accurate long-context reasoning. We evaluate the proposed framework on four real-world datasets spanning multiple jurisdictions and languages. Experimental results demonstrate that our method outperforms existing approaches in both prediction performance and computational efficiency.
A criminal judicial opinion represents the judge’s disposition of a case, including the decision rationale and sentencing. Automatically generating such opinions can assist in analyzing sentencing consistency and provide judges with references to past similar cases. However, current research typically approaches this task by dividing it into two isolated subtasks: legal reasoning and sentencing prediction. This separation often leads to inconsistency between the reasoning and predictions, failing to meet real-world judicial requirements. Furthermore, prior studies rely on manually creating knowledge to enhance applicability, yet such methods remain limited in practical deployment. To address these limitations and better align with legal practice, we propose a new LegalAI task: Criminal Judicial Opinion Generation, which simultaneously produces both legal reasoning and sentencing decisions. To achieve this, we introduce LegalChainReasoner framework that applies structured legal chains to guide the model through comprehensive case assessments. By integrating factual premises, composite legal conditions, and sentencing conclusions, our approach ensures flexible knowledge injection and end-to-end opinion generation. Experiments on real-world, open-source Chinese legal case datasets demonstrate that our method outperforms baseline models.

2025

Domain adaptation is widely employed in cross-domain sentiment analysis, enabling the transfer of models from label-rich source domains to target domain with fewer or no labels. However, concerns have been raised regarding their robustness and sensitivity to data distribution shift, particularly when encountering significant disparities in data distribution between the different domains. To tackle this problem, we introduce a framework CDAˆ2 for cross-domain adaptation in low-resource sentiment analysis, which utilizes counterfactual diffusion augmentation. Specifically, it employs samples derived from domain-relevant word substitutions in source domain samples to guide the diffusion model for generating high-quality counterfactual target domain samples. We adopt a soft absorbing state and MMD loss during the training stage, and use advanced ODE solvers to expedite the sampling process. Our experiments demonstrate that CDAˆ2 generates high-quality target samples and achieves state-of-the-art performance in cross-domain sentiment analysis.

2024

A summary structure is inherent to certain types of texts according to the Genre Theory of Linguistics. Such structures aid readers in efficiently locating information within summaries. However, most existing automatic summarization methods overlook the importance of summary structure, resulting in summaries that emphasize the most prominent information while omitting essential details from other sections. While a few summarizers recognize the importance of summary structure, they rely heavily on the predefined labels of summary structures in the source document and ground truth summaries. To address these shortcomings, we developed a Structured Knowledge-Guided Summarization (SKGSum) and its variant, SKGSum-W, which do not require structure labels. Instead, these methods rely on a set of automatically extracted summary points to generate summaries. We evaluate the proposed methods using three real-world datasets. The results indicate that our methods not only improve the quality of summaries, in terms of ROUGE and BERTScore, but also broaden the types of documents that can be effectively summarized.

2022

Legal document classification is an essential task in law intelligence to automate the labor-intensive law case filing process. Unlike traditional document classification problems, legal documents should be classified by reasons and facts instead of topics. We propose a Document-to-Graph Classifier (D2GCLF), which extracts facts as relations between key participants in the law case and represents a legal document with four relation graphs. Each graph is responsible for capturing different relations between the litigation participants. We further develop a graph attention network on top of the four relation graphs to classify the legal documents. Experiments on a real-world legal document dataset show that D2GCLF outperforms the state-of-the-art methods in terms of accuracy.