Guibin Zhang
2026
AgentAsk: Multi-Agent Systems Need to Ask
Bohan Lin | Kuo Yang | Zelin Tan | Yingchuan Lai | Chen Zhang | Guibin Zhang | Xinlei Yu | Miao Yu | Xu Wang | Yudong Zhang | Yang Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Bohan Lin | Kuo Yang | Zelin Tan | Yingchuan Lai | Chen Zhang | Guibin Zhang | Xinlei Yu | Miao Yu | Xu Wang | Yudong Zhang | Yang Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Multi-agent systems (MAS) built on large language models promise improved problem-solving through collaboration, yet they often fail to consistently outperform strong single-agent baselines due to error propagation at inter-agent message handoffs. In this work, we conduct a systematic empirical analysis of such failures and introduce an edge-level error taxonomy that identifies four dominant error types: Data Gap, Signal Corruption, Referential Drift, and Capability Gap, as primary sources of failure in multi-agent interactions. Building on this taxonomy, we propose AgentAsk, a lightweight clarification module designed to intervene at the edge level in MAS to prevent cascading errors. The module operates by strategically applying minimal clarifications at critical points within the system, improving the accuracy and efficiency of the overall task. AgentAsk is trained to balance the trade-offs between clarification cost, latency, and accuracy, while it is also architecture-agnostic and can be easily integrated into existing systems. Evaluated across five benchmarks, AgentAsk consistently improves accuracy by up to 4.69%, while keeping latency and extra costs below 10% compared to baseline MAS, showcasing its high efficiency and minimal overhead. The code is available at https://anonymous.4open.science/r/AgentAsk-3432.
EvoRoute: Experience-Driven Self-Routing LLM Agent Systems
Guibin Zhang | Haiyang Yu | Kaiming Yang | Bingli Wu | Fei Huang | Yongbin Li | Shuicheng Yan
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Guibin Zhang | Haiyang Yu | Kaiming Yang | Bingli Wu | Fei Huang | Yongbin Li | Shuicheng Yan
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Complex agentic AI systems, powered by a coordinated ensemble of Large Language Models (LLMs), tool and memory modules, have demonstrated remarkable capabilities on intricate, multi-turn tasks. However, this success is shadowed by prohibitive economic costs and severe latency, exposing a critical, yet underexplored, trade-off. We formalize this challenge as the Agent System Trilemma: the inherent tension among achieving state-of-the-art performance, minimizing monetary cost, and ensuring rapid task completion. To dismantle this trilemma, we introduce EvoRoute, a self-evolving model routing paradigm that transcends static, pre-defined model assignments. Leveraging an ever-expanding knowledge base of prior experience, EvoRoute dynamically selects Pareto-optimal LLM backbones at each step, balancing accuracy, efficiency, and resource use, while continually refining its own selection policy through environment feedback. Experiments on challenging agentic benchmarks such as GAIA and BrowseComp+ demonstrate that EvoRoute, when integrated into off-the-shelf agentic systems, not only sustains or enhances system performance but also reduces execution cost by up to 80% and latency by over 70%.
Scaling Behaviors of LLM Reinforcement Learning Post-Training: An Empirical Study in Mathematical Reasoning
Zelin Tan | Hejia Geng | Xiaohang Yu | Mulei Zhang | Guancheng Wan | Yifan Zhou | Qiang He | Xiangyuan Xue | Heng Zhou | Yutao Fan | Zhong-Zhi Li | Zaibin Zhang | Guibin Zhang | Chen Zhang | Zhenfei Yin | Philip Torr | Lei Bai
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zelin Tan | Hejia Geng | Xiaohang Yu | Mulei Zhang | Guancheng Wan | Yifan Zhou | Qiang He | Xiangyuan Xue | Heng Zhou | Yutao Fan | Zhong-Zhi Li | Zaibin Zhang | Guibin Zhang | Chen Zhang | Zhenfei Yin | Philip Torr | Lei Bai
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
While scaling laws for large language models (LLMs) during pre-training have been extensively studied, their behavior under reinforcement learning (RL) post-training remains largely unexplored. This paper investigates the scaling behavior of Large Language Model (LLM) reinforcement learning post-training, focusing on mathematical reasoning. Through experiments across the Qwen2.5 series (0.5B to 72B), we characterize how model scale, data, and compute interact. Our analysis yields four key findings: 1. Larger models consistently demonstrate superior compute and data efficiency. 2. The relationship between model performance and training resources follows a **predictive power-law** across both base and instruction-tuned models. 3. RL learning efficiency exhibits a latent **saturation trend** with increasing model scale. 4. In data-constrained regimes, performance is primarily driven by the **total volume of training data** rather than sample uniqueness. These results offer practical guidelines for scaling reasoning capabilities through reinforcement learning post-training.
2025
MasRouter: Learning to Route LLMs for Multi-Agent Systems
Yanwei Yue | Guibin Zhang | Boyang Liu | Guancheng Wan | Kun Wang | Dawei Cheng | Yiyan Qi
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yanwei Yue | Guibin Zhang | Boyang Liu | Guancheng Wan | Kun Wang | Dawei Cheng | Yiyan Qi
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Multi-agent systems (MAS) powered by Large Language Models (LLMs) have been demonstrated to push the boundaries of LLM capabilities, yet they often incur significant costs and face challenges in dynamic LLM selection. Current LLM routing methods effectively reduce overhead in single-agent scenarios by customizing LLM selection for each query, but they overlook the critical decisions regarding collaboration modes and agent roles in MAS. In response to this challenge, we first introduce the problem of Multi-Agent System Routing (MASR), which integrates all components of MAS into a unified routing framework. Toward this goal, we propose MasRouter, the first high-performing, cost-effective, and inductive MASR solution. MasRouter employs collaboration mode determination, role allocation, and LLM routing through a cascaded controller network, progressively constructing a MAS that balances effectiveness and efficiency. Extensive experiments demonstrate that MasRouter is (1) high-performing, achieving a 1.8 improvement over the state-of-the-art method on MBPP; (2) economical, reducing overhead by up to 52.07 compared to SOTA methods on HumanEval; and (3) plug-and-play, seamlessly integrating with mainstream MAS frameworks, reducing overhead by 17.21 via customized routing.
G-Safeguard: A Topology-Guided Security Lens and Treatment on LLM-based Multi-agent Systems
Shilong Wang | Guibin Zhang | Miao Yu | Guancheng Wan | Fanci Meng | Chongye Guo | Kun Wang | Yang Wang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Shilong Wang | Guibin Zhang | Miao Yu | Guancheng Wan | Fanci Meng | Chongye Guo | Kun Wang | Yang Wang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Language Model (LLM)-based Multi-agent Systems (MAS) have demonstrated remarkable capabilities in various complex tasks, ranging from collaborative problem-solving to autonomous decision-making. However, as these systems become increasingly integrated into critical applications, their vulnerability to adversarial attacks, misinformation propagation, and unintended behaviors have raised significant concerns. To address this challenge, we introduce G-Safeguard, a topology-guided security lens and treatment for robust LLM-MAS, which leverages graph neural networks to detect anomalies on the multi-agent utterance graph and employ topological intervention for attack remediation. Extensive experiments demonstrate that G-Safeguard: (I) exhibits significant effectiveness under various attack strategies, recovering over 40% of the performance for prompt injection; (II) is highly adaptable to diverse LLM backbones and large-scale MAS; (III) can seamlessly combine with mainstream MAS with security guarantees.
NetSafe: Exploring the Topological Safety of Multi-agent System
Miao Yu | Shilong Wang | Guibin Zhang | Junyuan Mao | Chenlong Yin | Qijiong Liu | Kun Wang | Qingsong Wen | Yang Wang
Findings of the Association for Computational Linguistics: ACL 2025
Miao Yu | Shilong Wang | Guibin Zhang | Junyuan Mao | Chenlong Yin | Qijiong Liu | Kun Wang | Qingsong Wen | Yang Wang
Findings of the Association for Computational Linguistics: ACL 2025
Large language models (LLMs) have fueled significant progress in intelligent Multi-agent Systems (MAS), with expanding academic and industrial applications. However, safeguarding these systems from malicious queries receives relatively little attention, while methods for single-agent safety are challenging to transfer. In this paper, we explore MAS safety from a topological perspective, aiming at identifying structural properties that enhance security. To this end, we propose NetSafe framework, unifying diverse MAS workflows via iterative RelCom interactions to enable generalized analysis. We identify several critical phenomena for MAS under attacks (misinformation, bias, and harmful content), termed as Agent Hallucination, Aggregation Safety and Security Bottleneck. Furthermore, we verify that highly connected and larger systems are more vulnerable to adversarial spread, with task performance in a Star Graph Topology decreasing by 29.7%. In conclusion, our work introduces a new perspective on MAS safety and discovers unreported phenomena, offering insights and posing challenges to the community.
Search
Fix author
Co-authors
- Guancheng Wan 3
- Kun Wang 3
- Yang Wang 3
- Miao Yu 3
- Zelin Tan 2
- Shilong Wang 2
- Chen Zhang 2
- Lei Bai 1
- Dawei Cheng 1
- Yutao Fan 1
- Hejia Geng 1
- Chongye Guo 1
- Qiang He 1
- Fei Huang 1
- Yingchuan Lai 1
- Yongbin Li 1
- Zhong-Zhi Li 1
- Bohan Lin 1
- Boyang Liu 1
- Qijiong Liu 1
- Junyuan Mao 1
- Fanci Meng 1
- Yiyan Qi 1
- Philip Torr 1
- Xu Wang 1
- Qingsong Wen 1
- Bingli Wu 1
- Xiangyuan Xue 1
- Shuicheng Yan 1
- Kuo Yang 1
- Kaiming Yang 1
- Zhenfei Yin 1
- Chenlong Yin 1
- Xinlei Yu 1
- Haiyang Yu 1
- Xiaohang Yu 1
- Yanwei Yue 1
- Yudong Zhang 1
- Mulei Zhang 1
- Zaibin Zhang 1
- Yifan Zhou 1
- Heng Zhou 1