Two Challenges, One Solution: Robust Multimodal Learning through Dynamic Modality Recognition and Enhancement

Lanxin Bi, Yunqi Zhang, Luyi Wang, Yake Niu, Hui Zhao


Abstract
Multimodal machine learning is often hindered by two critical challenges: modality missingness and modality imbalance. These challenges significantly degrade the performance of multimodal models. The majority of existing methods either require the availability of full-modality data during the training phase or necessitate explicit annotations to detect missing modalities. These dependencies severely limit the models’ applicability in the real world. To tackle these problems, we propose a Dynamic modality Recognition and Enhancement for Adaptive Multimodal fusion framework *DREAM*. Within DREAM, we innovatively employ a sample-level dynamic modality assessment mechanism to direct selective reconstruction of missing or underperforming modalities. Additionally, we introduce a soft masking fusion strategy that adaptively integrates different modalities according to their estimated contributions, enabling more accurate and robust predictions. Experimental results on three benchmark datasets consistently demonstrate that DREAM outperforms several representative baseline and state-of-the-art models, marking its robustness against modality missingness and imbalanced modality.
Anthology ID:
2025.findings-emnlp.689
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12855–12867
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.689/
DOI:
10.18653/v1/2025.findings-emnlp.689
Bibkey:
Cite (ACL):
Lanxin Bi, Yunqi Zhang, Luyi Wang, Yake Niu, and Hui Zhao. 2025. Two Challenges, One Solution: Robust Multimodal Learning through Dynamic Modality Recognition and Enhancement. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 12855–12867, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Two Challenges, One Solution: Robust Multimodal Learning through Dynamic Modality Recognition and Enhancement (Bi et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.689.pdf
Checklist:
 2025.findings-emnlp.689.checklist.pdf