Abstract
Multimodal emotion recognition aims to recognize emotions for each utterance from multiple modalities, which has received increasing attention for its application in human-machine interaction. Current graph-based methods fail to simultaneously depict global contextual features and local diverse uni-modal features in a dialogue. Furthermore, with the number of graph layers increasing, they easily fall into over-smoothing. In this paper, we propose a method for joint modality fusion and graph contrastive learning for multimodal emotion recognition (Joyful), where multimodality fusion, contrastive learning, and emotion recognition are jointly optimized. Specifically, we first design a new multimodal fusion mechanism that can provide deep interaction and fusion between the global contextual and uni-modal specific features. Then, we introduce a graph contrastive learning framework with inter- and intra-view contrastive losses to learn more distinguishable representations for samples with different sentiments. Extensive experiments on three benchmark datasets indicate that Joyful achieved state-of-the-art (SOTA) performance compared with all baselines. Code is released on Github (https://anonymous.4open.science/r/MERC-7F88).- Anthology ID:
- 2023.emnlp-main.996
- Volume:
- Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Houda Bouamor, Juan Pino, Kalika Bali
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 16051–16069
- Language:
- URL:
- https://aclanthology.org/2023.emnlp-main.996
- DOI:
- 10.18653/v1/2023.emnlp-main.996
- Cite (ACL):
- Dongyuan Li, Yusong Wang, Kotaro Funakoshi, and Manabu Okumura. 2023. Joyful: Joint Modality Fusion and Graph Contrastive Learning for Multimoda Emotion Recognition. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 16051–16069, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Joyful: Joint Modality Fusion and Graph Contrastive Learning for Multimoda Emotion Recognition (Li et al., EMNLP 2023)
- PDF:
- https://preview.aclanthology.org/emnlp-22-attachments/2023.emnlp-main.996.pdf