Hanbin Ko


2025

pdf bib
CREPE: Rapid Chest X-ray Report Evaluation by Predicting Multi-category Error Counts
Gihun Cho | Seunghyun Jang | Hanbin Ko | Inhyeok Baek | Chang Min Park
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

We introduce CREPE (Rapid Chest X-ray Report Evaluation by Predicting Multi-category Error Counts), a rapid, interpretable, and clinically grounded metric for automated chest X-ray report generation. CREPE uses a domain-specific BERT model fine-tuned with a multi-head regression architecture to predict error counts across six clinically meaningful categories. Trained on a large-scale synthetic dataset of 32,000 annotated report pairs, CREPE demonstrates strong generalization and interpretability. On the expert-annotated ReXVal dataset, CREPE achieves a Kendall’s tau correlation of 0.786 with radiologist error counts, outperforming traditional and recent metrics. CREPE achieves these results with an inference speed approximately 280 times faster than large language model (LLM)-based approaches, enabling rapid and fine-grained evaluation for scalable development of chest X-ray report generation models.

2020

pdf bib
CoNAN: A Complementary Neighboring-based Attention Network for Referring Expression Generation
Jungjun Kim | Hanbin Ko | Jialin Wu
Proceedings of the 28th International Conference on Computational Linguistics

Daily scenes are complex in the real world due to occlusion, undesired lighting conditions, etc. Although humans handle those complicated environments well, they evoke challenges for machine learning systems to identify and describe the target without ambiguity. Most previous research focuses on mining discriminating features within the same category for the target object. One the other hand, as the scene becomes more complicated, human frequently uses the neighbor objects as complementary information to describe the target one. Motivated by that, we propose a novel Complementary Neighboring-based Attention Network (CoNAN) that explicitly utilizes the visual differences between the target object and its highly-related neighbors. These highly-related neighbors are determined by an attentional ranking module, as complementary features, highlighting the discriminating aspects for the target object. The speaker module then takes the visual difference features as an additional input to generate the expression. Our qualitative and quantitative results on the dataset RefCOCO, RefCOCO+, and RefCOCOg demonstrate that our generated expressions outperform other state-of-the-art models by a clear margin.