Xu Wang

Other people with similar names: Xu Wang, Xu Wang, Xu Wang, Xu Wang

Unverified author pages with similar names: Xu Wang


2026

While current LLM agents utilizing paradigms like ReAct or Plan-and-Solve have established a strong foundation for step-by-step reasoning, they remain brittle in open-ended environments due to two intrinsic limitations: (1) A closed action space: These frameworks are confined to static, pre-defined toolsets, rendering them unable to adapt when required tools are missing or obsolete. (2) Myopic error recovery: Existing agents often get trapped in repetitive local retries, failing to diagnose and rectify root causes within the high-level plan. To overcome these limitations, we introduce CAR (Create And Replan), a novel architecture that incorporates a meta-tool synthesizer to dynamically augment the action space and a reflective replanning mechanism to revise global strategies. To rigorously evaluate our approach, we release ToolHop-Pro, a diagnostic benchmark with systematically pruned toolsets to simulate tool scarcity. Experiments demonstrate that CAR significantly outperforms representative baselines, validating its superior robustness where static agents fail. Code and data are available at https://github.com/Zaiz-77/car.
Multi-agent systems (MAS) built on large language models promise improved problem-solving through collaboration, yet they often fail to consistently outperform strong single-agent baselines due to error propagation at inter-agent message handoffs. In this work, we conduct a systematic empirical analysis of such failures and introduce an edge-level error taxonomy that identifies four dominant error types: Data Gap, Signal Corruption, Referential Drift, and Capability Gap, as primary sources of failure in multi-agent interactions. Building on this taxonomy, we propose AgentAsk, a lightweight clarification module designed to intervene at the edge level in MAS to prevent cascading errors. The module operates by strategically applying minimal clarifications at critical points within the system, improving the accuracy and efficiency of the overall task. AgentAsk is trained to balance the trade-offs between clarification cost, latency, and accuracy, while it is also architecture-agnostic and can be easily integrated into existing systems. Evaluated across five benchmarks, AgentAsk consistently improves accuracy by up to 4.69%, while keeping latency and extra costs below 10% compared to baseline MAS, showcasing its high efficiency and minimal overhead. The code is available at https://anonymous.4open.science/r/AgentAsk-3432.