Chenhao Li
2026
Multi-Hop Knowledge Editing via Critic-Guided Multi-Agent Reasoning
Xudong Li | Yuhang Tian | Dandan Song | Zhijing Wu | Shuhao Zhang | Jun Yang | Yongyu Huo | Changzhi Zhou | Xinyu Zhang | Chenhao Li | Huipeng Ma | Luan Zhang | Yan Xu | Qian Liu
Findings of the Association for Computational Linguistics: ACL 2026
Xudong Li | Yuhang Tian | Dandan Song | Zhijing Wu | Shuhao Zhang | Jun Yang | Yongyu Huo | Changzhi Zhou | Xinyu Zhang | Chenhao Li | Huipeng Ma | Luan Zhang | Yan Xu | Qian Liu
Findings of the Association for Computational Linguistics: ACL 2026
Knowledge within large language models (LLMs) inevitably lags behind an evolving world, motivating knowledge editing methods that update facts without expensive retraining. In multi-hop knowledge editing, models must not only recall updated facts but also correctly propagate them through multi-step reasoning chains. However, most existing approaches rely on unidirectional, feed-forward pipelines, decomposing questions and retrieving edited facts in a rigid hop-wise sequence. This design is brittle: a minor retrieval error or logical mismatch at an early hop can become a silent failure that cascades to the final answer without an explicit recovery mechanism. To address this limitation, we propose Critic-Guided Multi-Agent Reasoning for Knowledge Editing (CARE), a framework for closed-loop post-edit reasoning. A Critic agent performs chain-level verification by checking both global coherence and step-wise correctness, and triggers bounded backtracking for iterative self-correction, while a Selector agent supplies high-fidelity, low-noise candidate pools from the edit store to enable effective revision. Experiments on MQuAKE-2002 and MQuAKE-hard demonstrate that CARE effectively mitigates error propagation, achieving a new state-of-the-art.
Subgraph-Guided Executable Logical Form Generation for Knowledge Base Question Answering
Yuhang Tian | Dandan Song | Zhijing Wu | Changzhi Zhou | Jun Yang | Huipeng Ma | Chenhao Li | Luan Zhang | Yading Li | Xudong Li | Shenxi Liu | Jing Jiang
Findings of the Association for Computational Linguistics: ACL 2026
Yuhang Tian | Dandan Song | Zhijing Wu | Changzhi Zhou | Jun Yang | Huipeng Ma | Chenhao Li | Luan Zhang | Yading Li | Xudong Li | Shenxi Liu | Jing Jiang
Findings of the Association for Computational Linguistics: ACL 2026
Large Language Models (LLMs) have shown great potential in Knowledge Base Question Answering (KBQA) via semantic parsing. However, existing retrieval-augmented approaches typically retrieve entities and relations in isolation based solely on semantic similarity, ignoring the structural information of the Knowledge Base (KB) and the question. To address this limitation, we propose SELF-KBQA (Subgraph-Guided Executable Logical Form Generation), a novel framework that empowers LLMs to generate logical forms conditioned on structurally aligned and semantically relevant subgraphs. Specifically, we introduce a structure-aware subgraph retrieval stage that ranks candidate subgraphs by aligning them with the question’s structure, along with semantic relevance. Subsequently, we employ a token-budgeted evidence condensation strategy to distill the top-ranked subgraphs into compact contexts for the generation stage. Extensive experiments on GrailQA, WebQSP, and GraphQuestions demonstrate that SELF-KBQA achieves state-of-the-art performance.
Static Models, Dynamic World: A Unified Perspective on Temporal Perception in Large Language Models
Chenhao Li | Dandan Song | Changzhi Zhou | Jun Yang | Yuhang Tian | Huipeng Ma | Guangyuan Feng | Luan Zhang | Xudong Li | Ke Duan
Findings of the Association for Computational Linguistics: ACL 2026
Chenhao Li | Dandan Song | Changzhi Zhou | Jun Yang | Yuhang Tian | Huipeng Ma | Guangyuan Feng | Luan Zhang | Xudong Li | Ke Duan
Findings of the Association for Computational Linguistics: ACL 2026
Large language models are trained on static corpora but deployed in a dynamic world, leading to systematic temporal failures—from mis-anchored expressions and inconsistent timelines to hallucinated future events, stale world knowledge, and related issues. Existing surveys on temporal knowledge graphs, retrieval-augmented generation, hallucination, and knowledge editing cover only isolated fragments of this space: they are typically task-centric and do not offer a holistic theoretical account of how frozen LLMs represent and reason about time. This survey provides a unified perspective on temporal reasoning in LLMs. We formalize temporal queries in an information-theoretic framework based on the parametric reachability of temporal premises and answers, which induces four temporal information regimes corresponding to internal reasoning, answer recency, premise anchoring, and genuine world indeterminacy. Under this lens, we delineate the landscape of temporal failure modes, consolidate methodologies for diagnosing temporal deficiencies, and synthesize mitigation approaches into a coherent design space. Together, these contributions provide a systematic roadmap toward reliable time-aware large language models.
MedQPA-Gen: Medical Question Proposing and Answering for Report Generation
Weijie Liang | Xiyue Zhu | Ruike Zhu | Chenhao Li | Cheng Tang | Zhiyu Liu | Zhihua Gong | Shirui Luo | Yudu Li | Volodymyr Kindratenko
Findings of the Association for Computational Linguistics: ACL 2026
Weijie Liang | Xiyue Zhu | Ruike Zhu | Chenhao Li | Cheng Tang | Zhiyu Liu | Zhihua Gong | Shirui Luo | Yudu Li | Volodymyr Kindratenko
Findings of the Association for Computational Linguistics: ACL 2026
Medical report generation from medical images is a vital AI task that helps doctors with diagnosis and marks a significant step toward creating general AI-powered medical systems. However, previous methods either fail to optimize factual accuracy or heavily depend on expert preference data. To overcome these challenges, we propose MedQPA, an automatic and generalizable report evaluation technique that uses question proposing and answering to enable controllable, structured reasoning grounded in medical domain knowledge and the factual correctness of the report. Additionally, we design MedQPA-Gen, a medical report generation pipeline that maximizes the MedQPA score through prompt engineering and reinforcement learning with MedQPA as a reward signal. We demonstrate that MedQPA is an accurate evaluation metric that closely correlates with human preferences. More importantly, MedQPA-Gen achieves higher human preference scores and better performance on downstream tasks. We open-source code at this repo https://github.com/MedQPA-gen/MedQPA-gen.
2025
CompKBQA: Component-wise Task Decomposition for Knowledge Base Question Answering
Yuhang Tian | Dandan Song | Zhijing Wu | Pan Yang | Changzhi Zhou | Jun Yang | Hao Wang | Huipeng Ma | Chenhao Li | Luan Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Yuhang Tian | Dandan Song | Zhijing Wu | Pan Yang | Changzhi Zhou | Jun Yang | Hao Wang | Huipeng Ma | Chenhao Li | Luan Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Knowledge Base Question Answering (KBQA) aims to extract accurate answers from the Knowledge Base (KB). Traditional Semantic Parsing (SP)-based methods are widely used but struggle with complex queries. Recently, large language models (LLMs) have shown promise in improving KBQA performance. However, the challenge of generating error-free logical forms remains, as skeleton, topic Entity, and relation Errors still frequently occur. To address these challenges, we propose CompKBQA(Component-wise Task Decomposition for Knowledge Base Question Answering), a novel framework that optimizes the process of fine-tuning a LLM for generating logical forms by enabling the LLM to progressively learn relevant sub-tasks like skeleton generation, topic entity generation, and relevant relations generation. Additionally, we propose R3, which retrieves and incorporates KB information into the process of logical form generation. Experimental evaluations on two benchmark KBQA datasets, WebQSP and CWQ, demonstrate that CompKBQA achieves state-of-the-art performance, highlighting the importance of task decomposition and KB-aware learning.