Jaehoon Yun
2026
Benchmarking Direct Preference Optimization for Medical Large Vision–Language Models
Dain Kim | Jiwoo Lee | Jaehoon Yun | Yong Hoe Koo | Qingyu Chen | Hyunjae Kim | Jaewoo Kang
Findings of the Association for Computational Linguistics: EACL 2026
Dain Kim | Jiwoo Lee | Jaehoon Yun | Yong Hoe Koo | Qingyu Chen | Hyunjae Kim | Jaewoo Kang
Findings of the Association for Computational Linguistics: EACL 2026
Large vision-language models (LVLMs) are gaining traction in clinical tasks such as diagnostic support, report generation, and medical question answering. Among post-training techniques, Direct Preference Optimization (DPO) has shown promise in aligning model outputs with human preferences, yet its effectiveness in high-stakes medical contexts remains underexplored. In this work, we present the first systematic evaluation of nine DPO variants applied to two leading medical LVLMs, LLaVA-Med and HuatuoGPT-Vision. We benchmark these models on five curated datasets covering diverse clinical tasks. Evaluations include both automated metrics and expert assessments. Our results show that while DPO improves alignment and reduces severe hallucinations, it yields inconsistent gains over supervised fine-tuning. We further introduce DPO variant that better handles visual misinterpretations and enhances clinical understanding. These findings reveal both the potential and limitations of DPO in medical AI. To support future research, we will release all DPO training data, model checkpoints, and expert annotations upon acceptance.
2025
DMIS Lab at ArchEHR-QA 2025: Evidence-Grounded Answer Generation for EHR-based QA via a Multi-Agent Framework
Hyeon Hwang | Hyeongsoon Hwang | Jongmyung Jung | Jaehoon Yun | Minju Song | Yein Park | Dain Kim | Taewhoo Lee | Jiwoong Sohn | Chanwoong Yoon | Sihyeon Park | Jiwoo Lee | Heechul Yang | Jaewoo Kang
Proceedings of the 24th Workshop on Biomedical Language Processing (Shared Tasks)
Hyeon Hwang | Hyeongsoon Hwang | Jongmyung Jung | Jaehoon Yun | Minju Song | Yein Park | Dain Kim | Taewhoo Lee | Jiwoong Sohn | Chanwoong Yoon | Sihyeon Park | Jiwoo Lee | Heechul Yang | Jaewoo Kang
Proceedings of the 24th Workshop on Biomedical Language Processing (Shared Tasks)
Med-PRM: Medical Reasoning Models with Stepwise, Guideline-verified Process Rewards
Jaehoon Yun | Jiwoong Sohn | Jungwoo Park | Hyunjae Kim | Xiangru Tang | Daniel Shao | Yong Hoe Koo | Ko Minhyeok | Qingyu Chen | Mark Gerstein | Michael Moor | Jaewoo Kang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Jaehoon Yun | Jiwoong Sohn | Jungwoo Park | Hyunjae Kim | Xiangru Tang | Daniel Shao | Yong Hoe Koo | Ko Minhyeok | Qingyu Chen | Mark Gerstein | Michael Moor | Jaewoo Kang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Large language models have shown promise in clinical decision making, but current approaches struggle to localize and correct errors at specific steps of the reasoning process. This limitation is critical in medicine, where identifying and addressing reasoning errors is essential for accurate diagnosis and effective patient care. We introduce Med-PRM, a process reward modeling framework that leverages retrieval-augmented generation to verify each reasoning step against established medical knowledge bases. By verifying intermediate reasoning steps with evidence retrieved from clinical guidelines and literature, our model can precisely assess the reasoning quality in a fine-grained manner. Evaluations on five medical QA benchmarks and two open-ended diagnostic tasks demonstrate that Med-PRM achieves state-of-the-art performance, with improving the performance of base models by up to 13.50% using Med-PRM. Moreover, we demonstrate the generality of Med-PRM by integrating it in a plug-and-play fashion with strong policy models such as Meerkat, achieving over 80% accuracy on MedQA for the first time using small-scale models of 8 billion parameters.