X-FLoRA: Cross-modal Federated Learning with Modality-expert LoRA for Medical VQA

Min Hyuk Kim, Changheon Kim, Seok Bong Yoo


Abstract
Medical visual question answering (VQA) and federated learning (FL) have emerged as vital approaches for enabling privacy-preserving, collaborative learning across clinical institutions. However, both these approaches face significant challenges in cross-modal FL scenarios, where each client possesses unpaired images from only one modality. To address this limitation, we propose X-FLoRA, a cross-modal FL framework that uses modality-expert low-rank adaptation (LoRA) for medical VQA. Specifically, X-FLoRA enables the synthesis of images from one modality to another without requiring data sharing between clients. This is achieved by training a backward translation model within a federated asymmetric translation scheme that integrates clinical semantics from textual data. Additionally, X-FLoRA introduces modality-expert LoRA, which fine-tunes separate LoRA modules to strengthen modality-specific representations in the VQA task. The server aggregates the trained backward translation models and fine-tuned LoRA modules using discriminator quality scores and expert-aware weighting, which regulate the relative contributions from different clients. Experiments were conducted on VQA datasets encompassing different medical modalities, and the results demonstrate that X-FLoRA outperforms existing FL methods in terms of VQA performance.
Anthology ID:
2025.emnlp-main.422
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8390–8408
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.emnlp-main.422/
DOI:
10.18653/v1/2025.emnlp-main.422
Bibkey:
Cite (ACL):
Min Hyuk Kim, Changheon Kim, and Seok Bong Yoo. 2025. X-FLoRA: Cross-modal Federated Learning with Modality-expert LoRA for Medical VQA. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 8390–8408, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
X-FLoRA: Cross-modal Federated Learning with Modality-expert LoRA for Medical VQA (Kim et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.emnlp-main.422.pdf
Checklist:
 2025.emnlp-main.422.checklist.pdf