Xianda Zheng


2026

Despite recent progress, the reasoning capabilities of large multimodal language models (MLLMs) remain fundamentally constrained by static supervision, where fixed prompts, rules, or reward models provide non-adaptive guidance throughout training. Such static signals are often sufficient to enforce output formats, but fail to shape the underlying reasoning process, leading to brittle generalization and performance saturation in complex decision-making tasks. We propose Evo-PI, a principle-centric learning framework that treats reasoning principles as explicit, language-based supervision signals that can be generated, evaluated, and iteratively evolved. Instead of relying on fixed rewards, Evo-PI enables a co-evolutionary loop in which principles guide model reasoning, while model behaviors in turn refine the principles that supervise them. This dynamic alignment mechanism allows supervision to progressively adapt to the model’s reasoning deficiencies. We instantiate Evo-PI in medical visual question answering as a high-stakes testbed requiring structured visual–textual reasoning. Across eight benchmarks and multiple model backbones, Evo-PI consistently improves reasoning accuracy, achieving gains of up to 24.6%. Our results suggest that evolving principle-guided supervision offers a scalable and general paradigm for training expert-aligned reasoning in multimodal language models.
Explicit knowledge conflicts, where retrieved contexts contain contradictory information, have become increasingly prevalent as Large Language Models (LLMs) integrate diverse data sources. The core challenge lies in the complexity of entangled narratives and the heterogeneity of conflict cases, which impose excessive demands on the reasoning capabilities of standard models. To address this, we propose Knowledge Conflict Reasoning (KCR), a framework that adjudicates conflicts by structuring the underlying logic. KCR first disentangles conflicting contexts into distinct sets of reasoning traces, utilizing both textual and graph-based representations, to simplify comprehension. It then employs a Reinforcement Learning with Verifiable Rewards (RLVR) paradigm, guiding the model to internalize a reasoning process that maximizes logical consistency while actively suppressing spurious reasoning paths derived from contradictory contexts. Extensive experiments demonstrate that KCR yields substantial improvements: a KCR-enhanced 7B model surpasses the performance of baselines equipped with top-tier closed-source models such as GPT-4o and GPT-5.1.

2024

A summary structure is inherent to certain types of texts according to the Genre Theory of Linguistics. Such structures aid readers in efficiently locating information within summaries. However, most existing automatic summarization methods overlook the importance of summary structure, resulting in summaries that emphasize the most prominent information while omitting essential details from other sections. While a few summarizers recognize the importance of summary structure, they rely heavily on the predefined labels of summary structures in the source document and ground truth summaries. To address these shortcomings, we developed a Structured Knowledge-Guided Summarization (SKGSum) and its variant, SKGSum-W, which do not require structure labels. Instead, these methods rely on a set of automatically extracted summary points to generate summaries. We evaluate the proposed methods using three real-world datasets. The results indicate that our methods not only improve the quality of summaries, in terms of ROUGE and BERTScore, but also broaden the types of documents that can be effectively summarized.