Chang Min Park
2025
CREPE: Rapid Chest X-ray Report Evaluation by Predicting Multi-category Error Counts
Gihun Cho
|
Seunghyun Jang
|
Hanbin Ko
|
Inhyeok Baek
|
Chang Min Park
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
We introduce CREPE (Rapid Chest X-ray Report Evaluation by Predicting Multi-category Error Counts), a rapid, interpretable, and clinically grounded metric for automated chest X-ray report generation. CREPE uses a domain-specific BERT model fine-tuned with a multi-head regression architecture to predict error counts across six clinically meaningful categories. Trained on a large-scale synthetic dataset of 32,000 annotated report pairs, CREPE demonstrates strong generalization and interpretability. On the expert-annotated ReXVal dataset, CREPE achieves a Kendall’s tau correlation of 0.786 with radiologist error counts, outperforming traditional and recent metrics. CREPE achieves these results with an inference speed approximately 280 times faster than large language model (LLM)-based approaches, enabling rapid and fine-grained evaluation for scalable development of chest X-ray report generation models.
2022
RRED : A Radiology Report Error Detector based on Deep Learning Framework
Dabin Min
|
Kaeun Kim
|
Jong Hyuk Lee
|
Yisak Kim
|
Chang Min Park
Proceedings of the 4th Clinical Natural Language Processing Workshop
Radiology report is an official record of radiologists’ interpretation of patients’ radiographs and it’s a crucial component in the overall medical diagnostic process. However, it can contain various types of errors that can lead to inadequate treatment or delay in diagnosis. To address this problem, we propose a deep learning framework to detect errors in radiology reports. Specifically, our method detects errors between findings and conclusion of chest X-ray reports based on a supervised learning framework. To compensate for the lack of data availability of radiology reports with errors, we develop an error generator to systematically create artificial errors in existing reports. In addition, we introduce a Medical Knowledge-enhancing Pre-training to further utilize the knowledge of abbreviations and key phrases frequently used in the medical domain. We believe that this is the first work to propose a deep learning framework for detecting errors in radiology reports based on a rich contextual and medical understanding. Validation on our radiologist-synthesized dataset, based on MIMIC-CXR, shows 0.80 and 0.95 of the area under precision-recall curve (AUPRC) and the area under the ROC curve (AUROC) respectively, indicating that our framework can effectively detect errors in the real-world radiology reports.
Search
Fix author
Co-authors
- Inhyeok Baek 1
- Gihun Cho 1
- Seunghyun Jang 1
- Kaeun Kim 1
- Yisak Kim 1
- show all...