Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical Vision-Language Models
Songtao Jiang, Tuo Zheng, Yan Zhang, Yeying Jin, Li Yuan, Zuozhu Liu
Abstract
Recent advancements in general-purpose or domain-specific multimodal large language models (LLMs) have witnessed remarkable progress for medical decision-making. However, they are designated for specific classification or generative tasks, and require model training or finetuning on large-scale datasets with sizeable parameters and tremendous computing, hindering their clinical utility across diverse resource-constrained scenarios in practice. In this paper, we propose a novel and lightweight framework Med-MoE (Mixture-of-Experts) that tackles both discriminative and generative multimodal medical tasks. The learning of Med-MoE consists of three steps: multimodal medical alignment, Instruction tuning and routing, and domain-specific MoE tuning. After aligning multimodal medical images with LLM tokens, we then enable the model for different multimodal medical tasks with instruction tuning, together with a trainable router tailored for expert selection across input modalities. Finally, the model is tuned by integrating the router with multiple domain-specific experts, which are selectively activated and further empowered by meta experts. Comprehensive experiments on both open- and close-end medical question answering (Med-VQA) and image classification tasks across datasets such as VQA-RAD, SLAKE and Path-VQA demonstrate that our model can achieve performance superior to or on par with state-of-the-art baselines, while only requiring approximately 30%-50% of activated model parameters. Extensive analysis and ablations corroborate the effectiveness and practical utility of our method.- Anthology ID:
- 2024.findings-emnlp.221
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2024
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 3843–3860
- Language:
- URL:
- https://preview.aclanthology.org/icon-24-ingestion/2024.findings-emnlp.221/
- DOI:
- 10.18653/v1/2024.findings-emnlp.221
- Cite (ACL):
- Songtao Jiang, Tuo Zheng, Yan Zhang, Yeying Jin, Li Yuan, and Zuozhu Liu. 2024. Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical Vision-Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 3843–3860, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical Vision-Language Models (Jiang et al., Findings 2024)
- PDF:
- https://preview.aclanthology.org/icon-24-ingestion/2024.findings-emnlp.221.pdf