Collin Campbell
2026
Overview of the MedGenVidQA 2026 Shared Task on Medical Generative Video Question Answering
Deepak Gupta | Collin Campbell | Pedram Golnari | Dina Demner-Fushman
BioNLP 2026
Deepak Gupta | Collin Campbell | Pedram Golnari | Dina Demner-Fushman
BioNLP 2026
This paper presents an overview of the MedGenVidQA 2026 shared task on medical video question answering, collocated with the 25th BioNLP workshop at ACL 2026. The shared task addressed three related sub-tasks of the medical multimodal (textual and video) question answering: (i) multimodal retrieval tasks, (ii) multimodal answer generation with citations, and (iii) a visual answer localization task. The key theme of the stated task is to develop reliable multimodal question answering systems for consumers and medical professionals by leveraging generative models. A total of nine teams participated in the shared task challenges and submitted a total of forty-three submissions across all tasks. We performed both automated and human assessments to evaluate the submissions. This paper describes the tasks, datasets, evaluation metrics, participation, and baseline systems for all three tasks. Additionally, we summarize the techniques and results of the evaluation of the various approaches explored by the participating teams. Finally, we discuss the key findings and implications for the development of multimodal medical question answering.