ReEvalMed: Rethinking Medical Report Evaluation by Aligning Metrics with Real-World Clinical Judgment

Ruochen Li, Jun Li, Bailiang Jian, Kun Yuan, Youxiang Zhu


Abstract
Automatically generated radiology reports often receive high scores from existing evaluation metrics but fail to earn clinicians’ trust. This gap reveals fundamental flaws in how current metrics assess the quality of generated reports. We rethink the design and evaluation of these metrics and propose a clinically grounded Meta-Evaluation framework. We define clinically grounded criteria spanning clinical alignment and key metric capabilities, including discrimination, robustness, and monotonicity. Using a fine-grained dataset of ground truth and rewritten report pairs annotated with error types, clinical significance labels, and explanations, we systematically evaluate existing metrics and reveal their limitations in interpreting clinical semantics, such as failing to distinguish clinically significant errors, over-penalizing harmless variations, and lacking consistency across error severity levels. Our framework offers guidance for building more clinically reliable evaluation methods.
Anthology ID:
2025.emnlp-main.598
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11823–11837
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.598/
DOI:
Bibkey:
Cite (ACL):
Ruochen Li, Jun Li, Bailiang Jian, Kun Yuan, and Youxiang Zhu. 2025. ReEvalMed: Rethinking Medical Report Evaluation by Aligning Metrics with Real-World Clinical Judgment. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 11823–11837, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
ReEvalMed: Rethinking Medical Report Evaluation by Aligning Metrics with Real-World Clinical Judgment (Li et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.598.pdf
Checklist:
 2025.emnlp-main.598.checklist.pdf