Dilip K. Prasad
2026
Evaluating LLM-as-a-Judge for Medical Term Simplification
Ioana Buhnila | Aman Sinha | Rohit Agarwal | Dilip K. Prasad | Mathieu Constant
BioNLP 2026
Ioana Buhnila | Aman Sinha | Rohit Agarwal | Dilip K. Prasad | Mathieu Constant
BioNLP 2026
Highly technical medical terms are difficult for patients to understand during fast-paced hospital consultations, leading them to rely on Large Language Models (LLMs) for simplified explanations. However, LLMs can produce inaccurate or false information. Since expert evaluation is costly and time-consuming, LLM-as-a-Judge (LaaJ) approach is increasingly adopted to assess the quality of LLM-generated text. In this paper, we investigate the reliability and robustness of LaaJ for specialized medical knowledge by evaluating six LLMs for their judgment capabilities on three dimensions: correctness, readability, and completeness. We utilized three judgment setups: Vanilla, Epistemic, and Bias to probe robustness, and assess them against human expert annotations to measure alignment. To address the lack of specialized medical benchmarks, we introduce BrainCancerDB, an English dataset of 219 brain cancer terms with 23,652 annotations. Our findings indicate that while LLM-Judges and humans display similar trends in ranking simplified explanations, LLM-Judges tend to be more lenient on correctness, which may have serious implications in medical setting. Additionally, we observe that hallucinations in LaaJ setups can be mitigated by epistemic markers.
2025
Proceedings of the 1st Workshop on Confabulation, Hallucinations and Overgeneration in Multilingual and Practical Settings (CHOMPS 2025)
Aman Sinha | Raúl Vázquez | Timothee Mickus | Rohit Agarwal | Ioana Buhnila | Patrícia Schmidtová | Federica Gamba | Dilip K. Prasad | Jörg Tiedemann
Proceedings of the 1st Workshop on Confabulation, Hallucinations and Overgeneration in Multilingual and Practical Settings (CHOMPS 2025)
Aman Sinha | Raúl Vázquez | Timothee Mickus | Rohit Agarwal | Ioana Buhnila | Patrícia Schmidtová | Federica Gamba | Dilip K. Prasad | Jörg Tiedemann
Proceedings of the 1st Workshop on Confabulation, Hallucinations and Overgeneration in Multilingual and Practical Settings (CHOMPS 2025)