Reinforced Cross-modal Alignment for Radiology Report Generation

Han Qin, Yan Song


Abstract
Medical images are widely used in clinical decision-making, where writing radiology reports is a potential application that can be enhanced by automatic solutions to alleviate physicians’ workload. In general, radiology report generation is an image-text task, where cross-modal mappings between images and texts play an important role in generating high-quality reports. Although previous studies attempt to facilitate the alignment via the co-attention mechanism under supervised settings, they suffer from lacking valid and accurate correspondences due to no annotation of such alignment. In this paper, we propose an approach with reinforcement learning (RL) over a cross-modal memory (CMM) to better align visual and textual features for radiology report generation. In detail, a shared memory is used to record the mappings between visual and textual information, and the proposed reinforced algorithm is performed to learn the signal from the reports to guide the cross-modal alignment even though such reports are not directly related to how images and texts are mapped. Experimental results on two English radiology report datasets, i.e., IU X-Ray and MIMIC-CXR, show the effectiveness of our approach, where the state-of-the-art results are achieved. We further conduct human evaluation and case study which confirm the validity of the reinforced algorithm in our approach.
Anthology ID:
2022.findings-acl.38
Volume:
Findings of the Association for Computational Linguistics: ACL 2022
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
448–458
Language:
URL:
https://aclanthology.org/2022.findings-acl.38
DOI:
10.18653/v1/2022.findings-acl.38
Bibkey:
Cite (ACL):
Han Qin and Yan Song. 2022. Reinforced Cross-modal Alignment for Radiology Report Generation. In Findings of the Association for Computational Linguistics: ACL 2022, pages 448–458, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Reinforced Cross-modal Alignment for Radiology Report Generation (Qin & Song, Findings 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl24-info/2022.findings-acl.38.pdf
Software:
 2022.findings-acl.38.software.zip
Code
 cuhksz-nlp/r2genrl
Data
CheXpert