Rohitaswa Sarbhangia
2026
VAIDYA: Validated Agents for Intelligent Diagnosis and Yielded Analysis
Kalash Shah | Gautam Bhutani | Rohitaswa Sarbhangia | J Snehan
Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM)
Kalash Shah | Gautam Bhutani | Rohitaswa Sarbhangia | J Snehan
Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM)
Recent advances in large language models (LLMs) have demonstrated impressive medical reasoning capabilities. However, current evaluation methods are mostly limited to static case vignettes and multiple-choice questions which fail to reflect the complexity, uncertainty, and iterative nature of real-world clinical decision-making. To bridge this gap, we propose **DiagBench**, a novel benchmark where models interact dynamically with a LLM based Patient Simulator, querying relevant clinical details to formulate accurate diagnoses. To complement this, we introduce **MedConvBench**, a diagnostic conversation benchmark designed to assess the relevance and quality of model-generated clinical reasoning. To further address the interpretability and alignment challenges of AI-assisted diagnosis, we develop a modular and medically grounded framework called **VAIDYA** that mirrors a physician’s stepwise diagnostic reasoning. This structured approach improves transparency and yields substantial performance gains over base LLMs. Our work takes a critical step toward aligning AI systems with real-world clinical practices by combining dynamic interaction, interpretability, and clinical validation.