Michael Moor
2025
Med-PRM: Medical Reasoning Models with Stepwise, Guideline-verified Process Rewards
Jaehoon Yun
|
Jiwoong Sohn
|
Jungwoo Park
|
Hyunjae Kim
|
Xiangru Tang
|
Daniel Shao
|
Yong Hoe Koo
|
Ko Minhyeok
|
Qingyu Chen
|
Mark Gerstein
|
Michael Moor
|
Jaewoo Kang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Large language models have shown promise in clinical decision making, but current approaches struggle to localize and correct errors at specific steps of the reasoning process. This limitation is critical in medicine, where identifying and addressing reasoning errors is essential for accurate diagnosis and effective patient care. We introduce Med-PRM, a process reward modeling framework that leverages retrieval-augmented generation to verify each reasoning step against established medical knowledge bases. By verifying intermediate reasoning steps with evidence retrieved from clinical guidelines and literature, our model can precisely assess the reasoning quality in a fine-grained manner. Evaluations on five medical QA benchmarks and two open-ended diagnostic tasks demonstrate that Med-PRM achieves state-of-the-art performance, with improving the performance of base models by up to 13.50% using Med-PRM. Moreover, we demonstrate the generality of Med-PRM by integrating it in a plug-and-play fashion with strong policy models such as Meerkat, achieving over 80% accuracy on MedQA for the first time using small-scale models of 8 billion parameters.
2023
Style-Aware Radiology Report Generation with RadGraph and Few-Shot Prompting
Benjamin Yan
|
Ruochen Liu
|
David Kuo
|
Subathra Adithan
|
Eduardo Reis
|
Stephen Kwak
|
Vasantha Venugopal
|
Chloe O’Connell
|
Agustina Saenz
|
Pranav Rajpurkar
|
Michael Moor
Findings of the Association for Computational Linguistics: EMNLP 2023
Automatically generated reports from medical images promise to improve the workflow of radiologists. Existing methods consider an image-to-report modeling task by directly generating a fully-fledged report from an image. However, this conflates the content of the report (e.g., findings and their attributes) with its style (e.g., format and choice of words), which can lead to clinically inaccurate reports. To address this, we propose a two-step approach for radiology report generation. First, we extract the content from an image; then, we verbalize the extracted content into a report that matches the style of a specific radiologist. For this, we leverage RadGraph—a graph representation of reports—together with large language models (LLMs). In our quantitative evaluations, we find that our approach leads to beneficial performance. Our human evaluation with clinical raters highlights that the AI-generated reports are indistinguishably tailored to the style of individual radiologist despite leveraging only a few examples as context.
Search
Fix author
Co-authors
- Subathra Adithan 1
- Qingyu Chen 1
- Mark Gerstein 1
- Jaewoo Kang 1
- Hyunjae Kim 1
- show all...