The Confident Liar: Diagnosing Multi-Agent Debate with Log-Probabilities and LLM-as-Judge

Ali Keramati, Justin Cheok, Jacob Horne, Mark Warschauer


Abstract
Multi-agent debate systems are typically evaluated only on whether thefinal answer is correct, overlooking the quality of the intermediatereasoning that debate is designed to produce. This paper studies therelationship between three signals in multi-agent debate: token-levellog-probability distributions over reasoning tokens, LLM-as-judge rubricscores assigned to those tokens, and final task accuracy. We examinewhether internal confidence signals predict externally evaluated reasoningquality, and whether either signal aligns with task correctness, acrossthree domains: rubric-based scoring, mathematical reasoning, and factualquestion answering. Our framework pairs a two-agent debate architecture—a Constructor and an Auditor—with anLLM-as-judge that scores each agent’s reasoning along instructionfollowing, justification quality, and evidence grounding, together with acritical-failure flag. Experiments in the rubric-scoring domain reveal aconsistent four-phase confidence trajectory and a substantial roleasymmetry: confidence aligns with judged reasoning quality roughly twiceas strongly for the Constructor as for the Auditor, and confidence-based detection ofcritical reasoning failures is markedly more reliable for the Constructor(AUROC 0.804) than for the Auditor (0.634). These findings motivate thebroader cross-domain investigation proposed in this paper.
Anthology ID:
2026.acl-srw.121
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Santosh T.Y.S.S., Juan Diego Rodriguez, Ona de Gibert
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1361–1375
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-srw.121/
DOI:
Bibkey:
Cite (ACL):
Ali Keramati, Justin Cheok, Jacob Horne, and Mark Warschauer. 2026. The Confident Liar: Diagnosing Multi-Agent Debate with Log-Probabilities and LLM-as-Judge. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026), pages 1361–1375, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
The Confident Liar: Diagnosing Multi-Agent Debate with Log-Probabilities and LLM-as-Judge (Keramati et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-srw.121.pdf