Jianpeng Zhou

2026

Chain-of-Relations: Faithful and Efficient LLM Reasoning over Knowledge Graphs via Relation-Centric Exploration
Chenhui Liu | Jianpeng Zhou | Jiahai Wang
Findings of the Association for Computational Linguistics: ACL 2026

Knowledge graph question answering (KGQA) serves as an essential benchmark for KG-enhanced large language models. Among various approaches, agent-based methods have emerged as an effective solution.Existing methods adopt entity-centric exploration that incrementally constructs reasoning paths by selecting and connecting intermediate entities. However, they face two critical limitations. (1) Entity incompleteness vulnerability arises when some intermediate entities lack semantic information beyond opaque IDs, preventing relevance evaluation and leading to discarding valid reasoning paths.(2) Premature entity pruning occurs because beam search retains only top-ranked entities at each step, eliminating candidates before their relevance can be verified.To address these challenges, this paper proposes Chain-of-Relations (CoR) with relation-centric exploration and global entity filtering, reducing dependence on entity completeness and ensuring complete candidate retrieval before constraint validation.Experiments on three benchmark datasets show that CoR consistently outperforms strong baselines in both F1 score and KG-grounded Rate.

pdf bib abs

Policy-Guided Stepwise Action Planning for Controllable LLM Reasoning
Jianpeng Zhou | Qisheng Hu | Jiahai Wang | Wenya Wang
Findings of the Association for Computational Linguistics: ACL 2026

Steering large language model (LLM) reasoning via high-level reasoning actions offers a promising approach to improve robustness and interpretability. However, existing action-based paradigms, ranging from training-free prompting to static plan retrieval or prediction, often fail to consistently outperform standard generation because their planners tend to degenerate into repetitive loops or fixed patterns. We propose PG-HAP (Policy-Guided High-Level Action Planning), a lightweight stepwise planner–executor framework that learns to select reasoning actions dynamically while keeping the executor LLM fully frozen. The planner is trained with reinforcement learning to optimize answer correctness. To prevent degeneration, we introduce two targeted mechanisms: (i) an Action-Dependency Logit Mask that enforces valid transitions to avoid redundancy, and (ii) an Action Diversity Reward that discourages mode collapse by promoting varied action sequences. Across mathematical and commonsense reasoning benchmarks, PG-HAP improves accuracy over strong baselines while producing less redundant, more adaptive trajectories. This demonstrates that learning high-level planning alone can substantially strengthen reasoning without expensive end-to-end model tuning.

Co-authors

Venues

Findings2

Fix author