Supervised Attention Mechanism for Low-quality Multimodal Data

Sijie Mai, Shiqin Han, Haifeng Hu


Abstract
In practical applications, multimodal data are often of low quality, with noisy modalities and missing modalities being typical forms that severely hinder model performance, robustness, and applicability. However, current studies address these issues separately. To this end, we propose a framework for multimodal affective computing that jointly addresses missing and noisy modalities to enhance model robustness in low-quality data scenarios. Specifically, we view missing modality as a special case of noisy modality, and propose a supervised attention framework. In contrast to traditional attention mechanisms that rely on main task loss to update the parameters, we design supervisory signals for the learning of attention weights, ensuring that attention mechanisms can focus on discriminative information and suppress noisy information. We further propose a ranking-based optimization strategy to compare the relative importance of different interactions by adding a ranking constraint for attention weights, avoiding training noise caused by inaccurate absolute labels. The proposed model consistently outperforms state-of-the-art baselines on multiple datasets under the settings of complete modalities, missing modalities, and noisy modalities.
Anthology ID:
2025.emnlp-main.1084
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
21377–21397
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1084/
DOI:
Bibkey:
Cite (ACL):
Sijie Mai, Shiqin Han, and Haifeng Hu. 2025. Supervised Attention Mechanism for Low-quality Multimodal Data. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 21377–21397, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Supervised Attention Mechanism for Low-quality Multimodal Data (Mai et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1084.pdf
Checklist:
 2025.emnlp-main.1084.checklist.pdf