PEAR: Planner-Executor Agent Robustness Benchmark

Shen Dong, Mingxuan Zhang, Pengfei He, Li Ma, Bhavani Thuraisingham, Hui Liu, Yue Xing


Abstract
Large Language Model (LLM)–based Multi-Agent Systems (MAS) have emerged as a powerful paradigm for tackling complex, multi-step tasks across diverse domains. However, despite their impressive capabilities, MAS remain susceptible to adversarial manipulation. Existing studies typically examine isolated attack surfaces or specific scenarios, leaving a lack of holistic understanding of MAS vulnerabilities. To bridge this gap, we introduce PEAR, a benchmark for systematically evaluating both the utility and vulnerability of planner–executor MAS. While compatible with various MAS architectures, our benchmark focuses on the planner–executor structure—a practical and widely adopted design. Through extensive experiments, we find that (1) a weak planner degrades overall clean task performance more severely than a weak executor; (2) while a memory module is essential for the planner, incorporating a memory module into the executor yields only marginal improvements in clean-task performance; (3) there exists a trade-off between task performance and robustness; and (4) attacks targeting the planner are particularly effective at misleading the system. These findings offer actionable insights for enhancing the robustness of MAS and lay the groundwork for principled defenses in multi-agent settings.
Anthology ID:
2026.findings-eacl.237
Volume:
Findings of the Association for Computational Linguistics: EACL 2026
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4547–4567
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.237/
DOI:
Bibkey:
Cite (ACL):
Shen Dong, Mingxuan Zhang, Pengfei He, Li Ma, Bhavani Thuraisingham, Hui Liu, and Yue Xing. 2026. PEAR: Planner-Executor Agent Robustness Benchmark. In Findings of the Association for Computational Linguistics: EACL 2026, pages 4547–4567, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
PEAR: Planner-Executor Agent Robustness Benchmark (Dong et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.237.pdf
Checklist:
 2026.findings-eacl.237.checklist.pdf