SubmissionNumber#=%=#43 FinalPaperTitle#=%=#LLMs as Medical Safety Judges: Evaluating Alignment with Human Annotation in Patient-Facing QA ShortPaperTitle#=%=# NumberOfPages#=%=#8 CopyrightSigned#=%=#Yella Diekmann JobTitle#==# Organization#==# Abstract#==#The increasing deployment of LLMs in patient-facing medical QA raises concerns about the reliability and safety of their responses. Traditional evaluation methods rely on expert human annotation, which is costly, time-consuming, and difficult to scale. This study explores the feasibility of using LLMs as automated judges for medical QA evaluation. We benchmark LLMs against human annotators across eight qualitative safety metrics and introduce adversarial question augmentation to assess LLMs' robustness in evaluating medical responses. Our findings reveal that while LLMs achieve high accuracy in objective metrics such as scientific consensus and grammaticality, they struggle with more subjective categories like empathy and extent of harm. This work contributes to the ongoing discussion on automating safety assessments in medical AI and informs the development of more reliable evaluation methodologies. Author{1}{Firstname}#=%=#Yella Leonie Author{1}{Lastname}#=%=#Diekmann Author{1}{Username}#=%=#yelladiekmann Author{1}{Email}#=%=#yella.diekmann@emory.edu Author{1}{Affiliation}#=%=#Department of Computer Science, Emory University Author{2}{Firstname}#=%=#Chase M. Author{2}{Lastname}#=%=#Fensore Author{2}{Email}#=%=#chase.fensore@emory.edu Author{2}{Affiliation}#=%=#Department of Computer Science, Emory University Author{3}{Firstname}#=%=#Rodrigo M. Author{3}{Lastname}#=%=#Carrillo-Larco Author{3}{Email}#=%=#rodrigo.martin.carrillo.larco@emory.edu Author{3}{Affiliation}#=%=#Rollins School of Public Health, Emory University Author{4}{Firstname}#=%=#Eduard R. Author{4}{Lastname}#=%=#Castejon Rosales Author{4}{Email}#=%=#eduard.roberto.castejon.rosales@emory.edu Author{4}{Affiliation}#=%=#Department of Family and Preventive Medicine, Emory University School of Medicine Author{5}{Firstname}#=%=#Sakshi Author{5}{Lastname}#=%=#Shiromani Author{5}{Email}#=%=#sakshi.shiromani@emory.edu Author{5}{Affiliation}#=%=#Department of Ophthalmology, Emory University School of Medicine Author{6}{Firstname}#=%=#Rima Author{6}{Lastname}#=%=#Pai Author{6}{Email}#=%=#rima.pai@emory.edu Author{6}{Affiliation}#=%=#Rollins School of Public Health, Emory University Author{7}{Firstname}#=%=#Megha Author{7}{Lastname}#=%=#Shah Author{7}{Email}#=%=#megha.shah@emory.edu Author{7}{Affiliation}#=%=#Department of Family and Preventive Medicine, Emory University School of Medicine Author{8}{Firstname}#=%=#Joyce C. Author{8}{Lastname}#=%=#Ho Author{8}{Email}#=%=#joyce.c.ho@emory.edu Author{8}{Affiliation}#=%=#Department of Computer Science, Emory University ========== èéáğö