DKITNLP at ArchEHR-QA 2025: A Retrieval Augmented LLM Pipeline for Evidence-Based Patient Question Answering

Provia Kadusabe, Abhishek Kaushik, Fiona Lawless


Abstract
This paper describes our submission for the BioNLP ACL 2025 Shared task on grounded Question Answering (QA) from Electronic Health Records (EHRs). The task aims to automatically generate answers to patients’ health related questions that are grounded in the evidence from their clinical notes. We propose a two stage retrieval pipeline to identify relevant sentences to guide response generation by a Large Language Model (LLM). Specifically, our approach uses a BioBERT based bi-encoder for initial retrieval, followed by a re-ranking step using a fine-tuned cross-encoder to enhance retrieval precision. The final set of selected sentences serve as an input to Mistral 7B model which generates answers through few-shot prompting. Our approach achieves an overall score of 31.6 on the test set, outperforming a substantially larger baseline model LLaMA 3.3 70B (30.7), which demonstrates the effectiveness of retrieval-augmented generation for grounded QA.
Anthology ID:
2025.bionlp-share.20
Volume:
BioNLP 2025 Shared Tasks
Month:
August
Year:
2025
Address:
Vienna, Austria
Editors:
Sarvesh Soni, Dina Demner-Fushman
Venues:
BioNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
165–170
Language:
URL:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bionlp-share.20/
DOI:
Bibkey:
Cite (ACL):
Provia Kadusabe, Abhishek Kaushik, and Fiona Lawless. 2025. DKITNLP at ArchEHR-QA 2025: A Retrieval Augmented LLM Pipeline for Evidence-Based Patient Question Answering. In BioNLP 2025 Shared Tasks, pages 165–170, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
DKITNLP at ArchEHR-QA 2025: A Retrieval Augmented LLM Pipeline for Evidence-Based Patient Question Answering (Kadusabe et al., BioNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bionlp-share.20.pdf