3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark

Ivan Sviridov, Amina Miftakhova, Tereshchenko Artemiy Vladimirovich, Galina Zubkova, Pavel Blinov, Andrey Savchenko


Abstract
Though Large Vision-Language Models (LVLMs) are being actively explored in medicine, their ability to conduct complex real-world telemedicine consultations combining accurate diagnosis with professional dialogue remains underexplored. This paper presents **3MDBench** (**M**edical **M**ultimodal **M**ulti-agent **D**ialogue **Bench**mark), an open-source framework for simulating and evaluating LVLM-driven telemedical consultations. 3MDBench simulates patient variability through temperament-based Patient Agent and evaluates diagnostic accuracy and dialogue quality via Assessor Agent. It includes 2996 cases across 34 diagnoses from real-world telemedicine interactions, combining textual and image-based data. The experimental study compares diagnostic strategies for widely used open and closed-source LVLMs. We demonstrate that multimodal dialogue with internal reasoning improves F1 score by 6.5% over non-dialogue settings, highlighting the importance of context-aware, information-seeking questioning. Moreover, injecting predictions from a diagnostic convolutional neural network into the LVLM’s context boosts F1 by up to 20%. Source code is available at https://github.com/univanxx/3mdbench.
Anthology ID:
2025.emnlp-main.1353
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
26625–26665
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1353/
DOI:
Bibkey:
Cite (ACL):
Ivan Sviridov, Amina Miftakhova, Tereshchenko Artemiy Vladimirovich, Galina Zubkova, Pavel Blinov, and Andrey Savchenko. 2025. 3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 26625–26665, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark (Sviridov et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1353.pdf
Checklist:
 2025.emnlp-main.1353.checklist.pdf