Lanxin Bi


2025

pdf bib
Two Challenges, One Solution: Robust Multimodal Learning through Dynamic Modality Recognition and Enhancement
Lanxin Bi | Yunqi Zhang | Luyi Wang | Yake Niu | Hui Zhao
Findings of the Association for Computational Linguistics: EMNLP 2025

Multimodal machine learning is often hindered by two critical challenges: modality missingness and modality imbalance. These challenges significantly degrade the performance of multimodal models. The majority of existing methods either require the availability of full-modality data during the training phase or necessitate explicit annotations to detect missing modalities. These dependencies severely limit the models’ applicability in the real world. To tackle these problems, we propose a Dynamic modality Recognition and Enhancement for Adaptive Multimodal fusion framework *DREAM*. Within DREAM, we innovatively employ a sample-level dynamic modality assessment mechanism to direct selective reconstruction of missing or underperforming modalities. Additionally, we introduce a soft masking fusion strategy that adaptively integrates different modalities according to their estimated contributions, enabling more accurate and robust predictions. Experimental results on three benchmark datasets consistently demonstrate that DREAM outperforms several representative baseline and state-of-the-art models, marking its robustness against modality missingness and imbalanced modality.