Integrating Representation Subspace Mapping with Unimodal Auxiliary Loss for Attention-based Multimodal Emotion Recognition

Xulong Du, Xingnan Zhang, Dandan Wang, Yingying Xu, Zhiyuan Wu, Shiqing Zhang, Xiaoming Zhao, Jun Yu, Liangliang Lou


Abstract
Multimodal emotion recognition (MER) aims to identify emotions by utilizing affective information from multiple modalities. Due to the inherent disparities among these heterogeneous modalities, there is a large modality gap in their representations, leading to the challenge of fusing multiple modalities for MER. To address this issue, this work proposes a novel attention-based MER framework by integrating representation subspace mapping with unimodal auxiliary loss for enhancing multimodal fusion capabilities. Initially, a representation subspace mapping module is proposed to map each modality into two distinct subspaces. One is modality-public, enabling the acquisition of common representations and reducing the discrepancies across modalities. The other is modality-unique, retaining the unique characteristics of each modality while eliminating redundant inter-modal attributes. Then, a cross-modality attention is leveraged to bridge the modality gap in unique representations and facilitate modality adaptation. Additionally, our method designs an unimodal auxiliary loss to remove the noise unrelated to emotion classification, resulting in robust and meaningful representations for MER. Comprehensive experiments are conducted on the IEMOCAP and MSP-Improv datasets, and experiment results show that our method achieves superior performance to state-of-the-art MER methods. Keywords: Multimodal emotion recognition, representation subspace mapping, cross-modality attention, unimodal auxiliary loss, fusion
Anthology ID:
2024.lrec-main.799
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
9120–9130
Language:
URL:
https://aclanthology.org/2024.lrec-main.799
DOI:
Bibkey:
Cite (ACL):
Xulong Du, Xingnan Zhang, Dandan Wang, Yingying Xu, Zhiyuan Wu, Shiqing Zhang, Xiaoming Zhao, Jun Yu, and Liangliang Lou. 2024. Integrating Representation Subspace Mapping with Unimodal Auxiliary Loss for Attention-based Multimodal Emotion Recognition. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 9120–9130, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Integrating Representation Subspace Mapping with Unimodal Auxiliary Loss for Attention-based Multimodal Emotion Recognition (Du et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-5/2024.lrec-main.799.pdf