VAIDYA: Validated Agents for Intelligent Diagnosis and Yielded Analysis

Kalash Shah, Gautam Bhutani, Rohitaswa Sarbhangia, J Snehan


Abstract
Recent advances in large language models (LLMs) have demonstrated impressive medical reasoning capabilities. However, current evaluation methods are mostly limited to static case vignettes and multiple-choice questions which fail to reflect the complexity, uncertainty, and iterative nature of real-world clinical decision-making. To bridge this gap, we propose **DiagBench**, a novel benchmark where models interact dynamically with a LLM based Patient Simulator, querying relevant clinical details to formulate accurate diagnoses. To complement this, we introduce **MedConvBench**, a diagnostic conversation benchmark designed to assess the relevance and quality of model-generated clinical reasoning. To further address the interpretability and alignment challenges of AI-assisted diagnosis, we develop a modular and medically grounded framework called **VAIDYA** that mirrors a physician’s stepwise diagnostic reasoning. This structured approach improves transparency and yields substantial performance gains over base LLMs. Our work takes a critical step toward aligning AI systems with real-world clinical practices by combining dynamic interaction, interpretability, and clinical validation.
Anthology ID:
2026.gem-main.3
Volume:
Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Simon Mille, Sebastian Gehrmann, Patrícia Schmidtová, Ondřej Dušek, Marzieh Fadaee, Kyle Lo, Enrico Santus, Gabriel Stanovsky
Venues:
GEM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11–33
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.gem-main.3/
DOI:
Bibkey:
Cite (ACL):
Kalash Shah, Gautam Bhutani, Rohitaswa Sarbhangia, and J Snehan. 2026. VAIDYA: Validated Agents for Intelligent Diagnosis and Yielded Analysis. In Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM), pages 11–33, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
VAIDYA: Validated Agents for Intelligent Diagnosis and Yielded Analysis (Shah et al., GEM 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.gem-main.3.pdf