Multi-Modal Sarcasm Detection via Cross-Modal Graph Convolutional Network

Bin Liang, Chenwei Lou, Xiang Li, Min Yang, Lin Gui, Yulan He, Wenjie Pei, Ruifeng Xu


Abstract
With the increasing popularity of posting multimodal messages online, many recent studies have been carried out utilizing both textual and visual information for multi-modal sarcasm detection. In this paper, we investigate multi-modal sarcasm detection from a novel perspective by constructing a cross-modal graph for each instance to explicitly draw the ironic relations between textual and visual modalities. Specifically, we first detect the objects paired with descriptions of the image modality, enabling the learning of important visual information. Then, the descriptions of the objects are served as a bridge to determine the importance of the association between the objects of image modality and the contextual words of text modality, so as to build a cross-modal graph for each multi-modal instance. Furthermore, we devise a cross-modal graph convolutional network to make sense of the incongruity relations between modalities for multi-modal sarcasm detection. Extensive experimental results and in-depth analysis show that our model achieves state-of-the-art performance in multi-modal sarcasm detection.
Anthology ID:
2022.acl-long.124
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1767–1777
Language:
URL:
https://aclanthology.org/2022.acl-long.124
DOI:
10.18653/v1/2022.acl-long.124
Bibkey:
Cite (ACL):
Bin Liang, Chenwei Lou, Xiang Li, Min Yang, Lin Gui, Yulan He, Wenjie Pei, and Ruifeng Xu. 2022. Multi-Modal Sarcasm Detection via Cross-Modal Graph Convolutional Network. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1767–1777, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Multi-Modal Sarcasm Detection via Cross-Modal Graph Convolutional Network (Liang et al., ACL 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl24-info/2022.acl-long.124.pdf
Software:
 2022.acl-long.124.software.zip
Code
 hitsz-hlt/cmgcn