Multimodal Dual-Path Decoding for Medical Report Generation
Jinghan Sun, Dong Wei, Zhihong Zhu, Yuyang Xue, Steven McDonagh, Xian Wu
Abstract
Radiology report generation requires precise alignment between medical imaging findings and clinically coherent textual descriptions. While current methods predominantly rely on either large vision-language models (LVLMs) for visual grounding or large language models (LLMs) for medical narrative generation, they often fail to effectively integrate multimodal clinical evidence with domain-specific knowledge. This paper proposes a novel multimodal dual-path framework that synergistically combines LVLMs and LLMs to address these limitations. Our approach establishes a dynamic fusion between LVLMs’ visual-semantic grounding capabilities and LLMs’ clinical knowledge reasoning. Specifically, we employ a structured prompting strategy that models the report generation task into three clinically meaningful sections and introduces fine-grained multi-label classification prompts to guide the models, enabling more accurate and comprehensive clinical report generation. Experiments on the public MIMIC-CXR benchmark demonstrate our framework’s superiority over state-of-the-art methods.- Anthology ID:
- 2026.findings-acl.1997
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 40193–40204
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1997/
- DOI:
- Cite (ACL):
- Jinghan Sun, Dong Wei, Zhihong Zhu, Yuyang Xue, Steven McDonagh, and Xian Wu. 2026. Multimodal Dual-Path Decoding for Medical Report Generation. In Findings of the Association for Computational Linguistics: ACL 2026, pages 40193–40204, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- Multimodal Dual-Path Decoding for Medical Report Generation (Sun et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1997.pdf