Zhou Wu

2026

Retrieval-augmented generation (RAG) substantially extends the knowledge boundary of large language models. However, it still faces two major challenges when handling complex reasoning tasks: low context utilization and frequent hallucinations. To address these issues, we propose Self-Correcting RAG, a unified framework that reformulates retrieval and generation as constrained optimization and path planning. On the input side, we move beyond traditional greedy retrieval and, for the first time, formalize context selection as a multi-dimensional multiple-choice knapsack problem (MMKP), thereby maximizing information density and removing redundancy under a strict token budget. On the output side, we introduce a natural language inference (NLI)-guided Monte Carlo Tree Search (MCTS) mechanism, which leverages test-time compute to dynamically explore reasoning trajectories and validate the faithfulness of generated answers. Experiments on six open-domain and multi-hop QA datasets demonstrate that our method significantly improves reasoning accuracy on complex queries while effectively reducing hallucinations, outperforming strong existing baselines. Our code is available at https://github.com/xjiacs/Self-Correcting-RAG .

pdf bib abs

Despite the adoption of Large Language Models (LLMs) in legal AI, automated contract revision remains impeded because generic models often treat strict legal constraints as mere suggestions. To address this safety gap, we introduce the Risk-Constrained Bilevel Stackelberg Framework (RCBSF), modeling high-stakes revision as a rigorous strategic interaction rather than an open-ended conversation. RCBSF establishes a hierarchical Leader-Follower structure: a Global Prescriptive Agent (GPA) leader imposes definitive risk budgets, while a follower system—comprising a Constrained Revision Agent (CRA) and a Local Verification Agent (LVA)—iteratively optimizes the output within these strict boundaries. We theoretically prove this bilevel formulation converges to an equilibrium yielding strictly superior utility over unguided methods. Empirically, RCBSF achieves state-of-the-art performance, surpassing iterative baselines with an average Risk Resolution Rate (RRR) of 84.21% and enhanced token efficiency. Our code is available at https://github.com/xjiacs/RCBSF .

Co-authors

Venues

Findings2

Fix author