MedEureka: A Medical Domain Benchmark for Multi-Granularity and Multi-Data-Type Embedding-Based Retrieval

Yongqi Fan, Nan Wang, Kui Xue, Jingping Liu, Tong Ruan


Abstract
Embedding-based retrieval (EBR), the mainstream approach in information retrieval (IR), aims to help users obtain relevant information and plays a crucial role in retrieval-augmented generation (RAG) techniques of large language models (LLMs). Numerous methods have been proposed to significantly improve the quality of retrieved content and many generic benchmarks are proposed to evaluate the retrieval abilities of embedding models. However, texts in the medical domain present unique contexts, structures, and language patterns, such as terminology, doctor-patient dialogue, and electronic health records (EHRs). Despite these unique features, specific benchmarks for medical context retrieval are still lacking. In this paper, we propose MedEureka, an enriched benchmark designed to evaluate medical-context retrieval capabilities of embedding models with multi-granularity and multi-data types. MedEureka includes four levels of granularity and six types of medical texts, encompassing 18 datasets, incorporating granularity and data type description to prompt instruction-fine-tuned text embedding models for embedding generation. We also provide the MedEureka Toolkit to support evaluation on the MedEureka test set. Our experiments evaluate state-of-the-art open-source and proprietary embedding models, and fine-tuned classical baselines, providing a detailed performance analysis. This underscores the challenges of using embedding models for medical domain retrieval and the need for further research. Our code and data are released in the repository: https://github.com/JOHNNY-fans/MedEureka.
Anthology ID:
2025.findings-naacl.154
Volume:
Findings of the Association for Computational Linguistics: NAACL 2025
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2825–2851
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.findings-naacl.154/
DOI:
Bibkey:
Cite (ACL):
Yongqi Fan, Nan Wang, Kui Xue, Jingping Liu, and Tong Ruan. 2025. MedEureka: A Medical Domain Benchmark for Multi-Granularity and Multi-Data-Type Embedding-Based Retrieval. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 2825–2851, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
MedEureka: A Medical Domain Benchmark for Multi-Granularity and Multi-Data-Type Embedding-Based Retrieval (Fan et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.findings-naacl.154.pdf