Towards a Radiologist Imitation Framework for 3D CT Diagnosis with Multimodal LLMs

Kaidi Zhang; Zhiyuan Yan; Gao Cheng; Zhenyang Cai

Towards a Radiologist Imitation Framework for 3D CT Diagnosis with Multimodal LLMs

Kaidi Zhang, Zhiyuan Yan, Gao Cheng, Zhenyang Cai

Abstract

Three-dimensional Computed Tomography (3D CT) is a cornerstone of precision medicine. Most AI diagnostic models analyze large num bers of CTslices uniformly, treating all slices as equally important. While this has partly accel erated radiologists’workflows, it overlooks that clinically relevant information is often sparsely distributed throughout a volume. Without tar geted or weighted processing, fine-grained cues may be missed and substantial computation wasted on diagnostically uninformative slices. Wepropose aradiologist-simulating framework for selective and efficient 3D CT interpreta tion. Evaluated on a 3D CT dataset covering eight thoracic lesion types, it was compared with state-of-the-art multimodal large language models such as GPT-4o and supervised visual backbones including ViT and ResNet-50. Us ing accuracy, F1-score, AUC, and blind radiolo gist assessment, Screen-CLIP achieved an AUC of 0.87 and F1-score of 0.82, surpassing ViT Base (AUC: 0.84). For report generation, our method outperformed M3D across all metrics, reaching a BLEU-Avg of 29.03, and achieved the highest average Doctors’ Score (6.16/10) in a preliminary human evaluation.

Anthology ID:: 2026.bionlp-1.85
Volume:: BioNLP 2026
Month:: July
Year:: 2026
Address:: San Diego, California
Editors:: Dina Demner-Fushman, Sophia Ananiadou, Kirk Roberts, Junichi Tsujii
Venues:: BioNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1056–1065
Language:
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.bionlp-1.85/
DOI:
Bibkey:
Cite (ACL):: Kaidi Zhang, Zhiyuan Yan, Gao Cheng, and Zhenyang Cai. 2026. Towards a Radiologist Imitation Framework for 3D CT Diagnosis with Multimodal LLMs. In BioNLP 2026, pages 1056–1065, San Diego, California. Association for Computational Linguistics.
Cite (Informal):: Towards a Radiologist Imitation Framework for 3D CT Diagnosis with Multimodal LLMs (Zhang et al., BioNLP 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.bionlp-1.85.pdf

PDF Cite Search Fix data