Error Detection in Medical Note through Multi Agent Debate

Abdine Maiga, Anoop Shah, Emine Yilmaz


Abstract
Large Language Models (LLMs) have approached human-level performance in text generation and summarization, yet their application in clinical settings remains constrained by potential inaccuracies that could lead to serious consequences. This work addresses the critical safety weaknesses in medical documentation systems by focusing on detecting subtle errors that require specialized medical expertise. We introduce a novel multi-agent debating framework that achieves 78.8% accuracy on medical error detection, significantly outperforming both single-agent approaches and previous multi-agent systems. Our framework leverages specialized LLM agents with asymmetric access to complementary medical knowledge sources (Mayo Clinic and WebMD), engaging them in structured debate to identify inaccuracies in clinical notes. A judge agent evaluates these arguments based solely on their medical reasoning quality, with agent-specific performance metrics incorporated as feedback for developing situation-specific trust models.
Anthology ID:
2025.bionlp-1.12
Volume:
ACL 2025
Month:
August
Year:
2025
Address:
Viena, Austria
Editors:
Dina Demner-Fushman, Sophia Ananiadou, Makoto Miwa, Junichi Tsujii
Venues:
BioNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
124–135
Language:
URL:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bionlp-1.12/
DOI:
Bibkey:
Cite (ACL):
Abdine Maiga, Anoop Shah, and Emine Yilmaz. 2025. Error Detection in Medical Note through Multi Agent Debate. In ACL 2025, pages 124–135, Viena, Austria. Association for Computational Linguistics.
Cite (Informal):
Error Detection in Medical Note through Multi Agent Debate (Maiga et al., BioNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bionlp-1.12.pdf
Supplementarymaterial:
 2025.bionlp-1.12.SupplementaryMaterial.txt