Multi-modal sarcasm detection aims to identify whether a given image-text pair is sarcastic. The pivotal factor of the task lies in accurately capturing incongruities from different modalities. Although existing studies have achieved impressive success, they primarily committed to fusing the textual and visual information to establish cross-modal correlations, overlooking the significance of original unimodal incongruity information at the text-level and image-level. Furthermore, the utilized fusion strategies of cross-modal information neglected the effect of inherent ambiguity within text and image modalities on multimodal fusion. To overcome these limitations, we propose a novel Ambiguity-aware Multi-level Incongruity Fusion Network (AMIF) for multi-modal sarcasm detection. Our method involves a multi-level incongruity learning module to capture the incongruity information simultaneously at the text-level, image-level and cross-modal-level. Additionally, an ambiguity-based fusion module is developed to dynamically learn reasonable weights and interpretably aggregate incongruity features from different levels. Comprehensive experiments conducted on a publicly available dataset demonstrate the superiority of our proposed model over state-of-the-art methods.
Argument pair extraction (APE) is a task that aims to extract interactive argument pairs from two argument passages. Generally, existing works focus on either simple argument interaction or task form conversion, instead of thorough deep-level feature exploitation of argument pairs. To address this issue, a Semantics-Aware Dual Graph Convolutional Networks (SADGCN) is proposed for APE. Specifically, the co-occurring word graph is designed to tackle the lexical and semantic relevance of arguments with a pre-trained Rouge-guided Transformer (ROT). Considering the topic relevance in argument pairs, a topic graph is constructed by the neural topic model to leverage the topic information of argument passages. The two graphs are fused via a gating mechanism, which contributes to the extraction of argument pairs. Experimental results indicate that our approach achieves the state-of-the-art performance. The performance on F1 score is significantly improved by 6.56% against the existing best alternative.