Ricardo García

2024

pdf bib abs
NEUI at MEDIQA-M3G 2024: Medical VQA through consensus
Ricardo García | Oscar Lithgow-Serrano
Proceedings of the 6th Clinical Natural Language Processing Workshop

This document describes our solution to the MEDIQA-M3G: Multilingual & Multimodal Medical Answer Generation. To build our solution, we leveraged two pre-trained models, a Visual Language Model (VLM) and a Large Language Model (LLM). We fine-tuned both models using the MEDIQA-M3G and MEDIQA-CORR training datasets, respectively. In the first stage, the VLM provides singular responses for each pair of image & text inputs in a case. In the second stage, the LLM consolidates the VLM responses using it as context among the original text input. By changing the original English case content field in the context component of the second stage to the one in Spanish, we adapt the pipeline to generate submissions in English and Spanish. We performed an ablation study to explore the impact of the different models’ capabilities, such as multimodality and reasoning, on the MEDIQA-M3G task. Our approach favored privacy and feasibility by adopting open-source and self-hosted small models and ranked 4th in English and 2nd in Spanish.

Co-authors

Oscar Lithgow-Serrano 1

Venues

clinicalnlp1
ws1

Fix data

Ricardo García

Fixing paper assignments

2024

Co-authors

Venues