Fusion of Domain-Adapted Vision and Language Models for Medical Visual Question Answering
Cuong Ha, Shima Asaadi, Sanjeev Kumar Karn, Oladimeji Farri, Tobias Heimann, Thomas Runkler
Abstract
Vision-language models, while effective in general domains and showing strong performance in diverse multi-modal applications like visual question-answering (VQA), struggle to maintain the same level of effectiveness in more specialized domains, e.g., medical. We propose a medical vision-language model that integrates large vision and language models adapted for the medical domain. This model goes through three stages of parameter-efficient training using three separate biomedical and radiology multi-modal visual and text datasets. The proposed model achieves state-of-the-art performance on the SLAKE 1.0 medical VQA (MedVQA) dataset with an overall accuracy of 87.5% and demonstrates strong performance on another MedVQA dataset, VQA-RAD, achieving an overall accuracy of 73.2%.- Anthology ID:
- 2024.clinicalnlp-1.21
- Volume:
- Proceedings of the 6th Clinical Natural Language Processing Workshop
- Month:
- June
- Year:
- 2024
- Address:
- Mexico City, Mexico
- Editors:
- Tristan Naumann, Asma Ben Abacha, Steven Bethard, Kirk Roberts, Danielle Bitterman
- Venues:
- ClinicalNLP | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 246–257
- Language:
- URL:
- https://aclanthology.org/2024.clinicalnlp-1.21
- DOI:
- 10.18653/v1/2024.clinicalnlp-1.21
- Cite (ACL):
- Cuong Ha, Shima Asaadi, Sanjeev Kumar Karn, Oladimeji Farri, Tobias Heimann, and Thomas Runkler. 2024. Fusion of Domain-Adapted Vision and Language Models for Medical Visual Question Answering. In Proceedings of the 6th Clinical Natural Language Processing Workshop, pages 246–257, Mexico City, Mexico. Association for Computational Linguistics.
- Cite (Informal):
- Fusion of Domain-Adapted Vision and Language Models for Medical Visual Question Answering (Ha et al., ClinicalNLP-WS 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2024.clinicalnlp-1.21.pdf