Ranran Bu


2026

Knowledge Base Question Answering (KBQA) aims to retrieve accurate answers to natural language queries by retrieving and reasoning over large-scale structured knowledge bases (KBs). Advanced semantic parsing-based methods promoted by large language models (LLMs) demonstrate superior performance by transforming questions into structured queries, i.e., logical forms (LFs). However, LFs generated by LLMs could be non-executable due to the inherent semantic hallucination issue of LLMs and the complex graph retrieval characteristics of the KBQA task. To address this challenge, we propose a novel "generate-verify-refine" framework, termed Action-Reflection-Integrated KBQA (ARI-KBQA) for reliable LF generation. ARI-KBQA introduces a dual-module cooperative architecture: First, an action generator is trained to produce initial query paths based on a hop-by-hop reasoning strategy. Then a reflection verifier dynamically validates path feasibility by interacting with the KBs. Consequently, ARI-KBQA filters out invalid LFs and provides semantic correction feedback to the action generator for iteratively refining LFs. Evaluations on standard KBQA benchmarks show that the proposed ARI-KBQA significantly enhances model performance with a reduced search space, especially in complex multi-hop query scenarios.

2025

Knowledge base question answering (KBQA) refers to the task of answering natural language questions using large-scale structured knowledge bases (KBs). Existing semantic parsing-based (SP-based) methods achieve superior performance by directly converting questions into structured logical form (LF) queries using fine-tuned large language models (LLMs). However, these methods face the key challenge of difficulty in directly generating LFs for complex graph structures, which often leads to non-executable LFs that negatively impact overall KBQA performance. To address this challenge, we propose KaeDe, a novel generate-then-retrieve method for KBQA. This approach integrates knowledge-aware question decomposition and subsequent progressive LF generation within the generation phase, followed by an unsupervised retrieval phase. Specifically, the original question is decomposed into simplified, topic entity-centric sub-questions and explanations within the KB context. Path-level LFs are derived from these intermediate expressions and then combined into a comprehensive graph-level LF. Finally, the LF is refined through unsupervised entity and relation retrieval. Experimental results demonstrate that our method achieves state-of-the-art (SOTA) performance on WebQuestionSP (WebQSP) and ComplexWebQuestions (CWQ) benchmarks, particularly with fewer model parameters.