A Cross-Modality Context Fusion and Semantic Refinement Network for Emotion Recognition in Conversation

Xiaoheng Zhang, Yang Li


Abstract
Emotion recognition in conversation (ERC) has attracted enormous attention for its applications in empathetic dialogue systems. However, most previous researches simply concatenate multimodal representations, leading to an accumulation of redundant information and a limited context interaction between modalities. Furthermore, they only consider simple contextual features ignoring semantic clues, resulting in an insufficient capture of the semantic coherence and consistency in conversations. To address these limitations, we propose a cross-modality context fusion and semantic refinement network (CMCF-SRNet). Specifically, we first design a cross-modal locality-constrained transformer to explore the multimodal interaction. Second, we investigate a graph-based semantic refinement transformer, which solves the limitation of insufficient semantic relationship information between utterances. Extensive experiments on two public benchmark datasets show the effectiveness of our proposed method compared with other state-of-the-art methods, indicating its potential application in emotion recognition. Our model will be available at https://github.com/zxiaohen/CMCF-SRNet.
Anthology ID:
2023.acl-long.732
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
13099–13110
Language:
URL:
https://aclanthology.org/2023.acl-long.732
DOI:
10.18653/v1/2023.acl-long.732
Bibkey:
Cite (ACL):
Xiaoheng Zhang and Yang Li. 2023. A Cross-Modality Context Fusion and Semantic Refinement Network for Emotion Recognition in Conversation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13099–13110, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
A Cross-Modality Context Fusion and Semantic Refinement Network for Emotion Recognition in Conversation (Zhang & Li, ACL 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-5/2023.acl-long.732.pdf
Video:
 https://preview.aclanthology.org/nschneid-patch-5/2023.acl-long.732.mp4