Edmond S. L. Ho
Also published as: Edmond S.L. Ho
2025
Libra: Leveraging Temporal Images for Biomedical Radiology Analysis
Xi Zhang
|
Zaiqiao Meng
|
Jake Lever
|
Edmond S. L. Ho
Findings of the Association for Computational Linguistics: ACL 2025
Radiology report generation (RRG) requires advanced medical image analysis, effective temporal reasoning, and accurate text generation. While multimodal large language models (MLLMs) align with pre-trained vision encoders to enhance visual-language understanding, most existing methods rely on single-image analysis or rule-based heuristics to process multiple images, failing to fully leverage temporal information in multi-modal medical datasets. In this paper, we introduce **Libra**, a temporal-aware MLLM tailored for chest X-ray report generation. Libra combines a radiology-specific image encoder with a novel Temporal Alignment Connector (**TAC**), designed to accurately capture and integrate temporal differences between paired current and prior images. Extensive experiments on the MIMIC-CXR dataset demonstrate that Libra establishes a new state-of-the-art benchmark among similarly scaled MLLMs, setting new standards in both clinical relevance and lexical accuracy. All source code and data are publicly available at: https://github.com/X-iZhang/Libra.
2024
Gla-AI4BioMed at RRG24: Visual Instruction-tuned Adaptation for Radiology Report Generation
Xi Zhang
|
Zaiqiao Meng
|
Jake Lever
|
Edmond S.L. Ho
Proceedings of the 23rd Workshop on Biomedical Natural Language Processing
This paper introduces a radiology-focused visual language model designed to generate radiology reports from chest X-rays. Building on previous findings that large language models can acquire multimodal capabilities when aligned with pretrained vision encoders, we demonstrate similar potential with chest X-ray images. The model combines an image encoder (CLIP) with a fine-tuned large language model (LLM) based on the Vicuna-7B architecture. The training process involves a two-stage approach: initial alignment of chest X-ray features with the LLM, followed by fine-tuning for radiology report generation. The study highlights the importance of generating both FINDINGS and IMPRESSIONS sections in radiology reports and evaluates the model’s performance using various metrics, achieving notable accuracy in generating high-quality medical reports. The research also addresses the need for domain-specific fine-tuning to capture the intricate details necessary for accurate medical interpretations and reports.