Towards Self-Improving Error Diagnosis in Multi-Agent Systems

Jiazheng Li; Emine Yilmaz; Bei Chen; Thu Le

Towards Self-Improving Error Diagnosis in Multi-Agent Systems

Jiazheng Li, Emine Yilmaz, Bei Chen, Thu Le

Abstract

Large Language Model (LLM)-based Multi-Agent Systems (MAS) enable complex problem-solving but introduce significant debugging challenges, characterized by long interaction traces, inter-agent dependencies, and delayed error manifestation. Existing diagnostic approaches often rely on expensive expert annotation or ”LLM-as-a-judge” paradigms, which struggle to pinpoint decisive error steps within extended contexts. In this paper, we introduce ErrorProbe, a self-improving framework for semantic failure attribution that identifies responsible agents and the originating error step. The framework operates via a three-stage pipeline: (1) operationalizing the MAS failure taxonomy to detect local anomalies, (2) performing symptom-driven backward tracing to prune irrelevant context, and (3) employing a specialized multi-agent team (Strategist, Investigator, Arbiter) to validate error hypotheses through tool-grounded execution. Crucially, ErrorProbe maintains a verified episodic memory that updates only when error patterns are confirmed by executable evidence, without the need for annotation. Experiments across the TracerTraj and Who When benchmarks demonstrate that ErrorProbe significantly outperforms baselines, particularly in step-level localization, while the verified memory enables robust cross-domain transfer without retraining.

Anthology ID:: 2026.findings-acl.98
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2063–2077
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.98/
DOI:
Bibkey:
Cite (ACL):: Jiazheng Li, Emine Yilmaz, Bei Chen, and Thu Le. 2026. Towards Self-Improving Error Diagnosis in Multi-Agent Systems. In Findings of the Association for Computational Linguistics: ACL 2026, pages 2063–2077, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Towards Self-Improving Error Diagnosis in Multi-Agent Systems (Li et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.98.pdf
Checklist:: 2026.findings-acl.98.checklist.pdf

PDF Cite Search Checklist Fix data