NEUI at MEDIQA-M3G 2024: Medical VQA through consensus

Ricardo García, Oscar Lithgow-Serrano


Abstract
This document describes our solution to the MEDIQA-M3G: Multilingual & Multimodal Medical Answer Generation. To build our solution, we leveraged two pre-trained models, a Visual Language Model (VLM) and a Large Language Model (LLM). We fine-tuned both models using the MEDIQA-M3G and MEDIQA-CORR training datasets, respectively. In the first stage, the VLM provides singular responses for each pair of image & text inputs in a case. In the second stage, the LLM consolidates the VLM responses using it as context among the original text input. By changing the original English case content field in the context component of the second stage to the one in Spanish, we adapt the pipeline to generate submissions in English and Spanish. We performed an ablation study to explore the impact of the different models’ capabilities, such as multimodality and reasoning, on the MEDIQA-M3G task. Our approach favored privacy and feasibility by adopting open-source and self-hosted small models and ranked 4th in English and 2nd in Spanish.
Anthology ID:
2024.clinicalnlp-1.45
Volume:
Proceedings of the 6th Clinical Natural Language Processing Workshop
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Tristan Naumann, Asma Ben Abacha, Steven Bethard, Kirk Roberts, Danielle Bitterman
Venues:
ClinicalNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
448–460
Language:
URL:
https://aclanthology.org/2024.clinicalnlp-1.45
DOI:
Bibkey:
Cite (ACL):
Ricardo García and Oscar Lithgow-Serrano. 2024. NEUI at MEDIQA-M3G 2024: Medical VQA through consensus. In Proceedings of the 6th Clinical Natural Language Processing Workshop, pages 448–460, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
NEUI at MEDIQA-M3G 2024: Medical VQA through consensus (García & Lithgow-Serrano, ClinicalNLP-WS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/jeptaln-2024-ingestion/2024.clinicalnlp-1.45.pdf