Soumya Gayen
2025
Overview of the ArchEHR-QA 2025 Shared Task on Grounded Question Answering from Electronic Health Records
Sarvesh Soni
|
Soumya Gayen
|
Dina Demner-Fushman
Proceedings of the 24th Workshop on Biomedical Language Processing
This paper presents an overview of the ArchEHR-QA 2025 shared task, which was organized with the 24th BioNLP Workshop at ACL 2025. The goal of this shared task is to develop automated responses to patients’ questions by generating answers that are grounded in key clinical evidence from patients’ electronic health records (EHRs). A total of 29 teams participated in the task, collectively submitting 75 systems, with 24 teams providing their system descriptions. The submitted systems encompassed diverse architectures (including approaches that select the most relevant evidence prior to answer generation), leveraging both proprietary and open-weight large language models, as well as employing various tuning strategies such as fine-tuning and few-shot learning. In this paper, we describe the task setup, the dataset used, the evaluation criteria, and the baseline systems. Furthermore, we summarize the methodologies adopted by participating teams and present a comprehensive evaluation and analysis of the submitted systems.
Will Gen Z users look for evidence to verify QA System-generated answers?
Soumya Gayen
|
Dina Demner-Fushman
|
Deepak Gupta
Proceedings of the Second Workshop on Patient-Oriented Language Processing (CL4Health)
The remarkable results shown by medicalquestion-answering systems lead to theiradoption in real-life applications. The systems,however, may misinform the users, even whendrawing on scientific evidence to ground theresults. The quality of the answers maybe verified by the users if they analyze theevidence provided by the systems. Userinterfaces play an important role in engagingthe users. While studies of the user interfacesfor biomedical literature search and clinicaldecision support are abundant, little is knownabout users’ interactions with medical questionanswering systems and the impact of thesesystems on health-related decisions. In a studyof several different user interface layouts, wefound that only a small number of participantsfollowed the links to verify automaticallygenerated answers, independently of theinterface design. The users who followed thelinks made better health-related decisions.