Incongruity-aware Tension Field Network for Multi-modal Sarcasm Detection

Jiecheng Zhang, C.L.Philip Chen, Shuzhen Li, Tong Zhang


Abstract
Multi-modal sarcasm detection (MSD) identifies sarcasm and accurately understands users’ real attitudes from text-image pairs. Most MSD researches explore the incongruity of text-image pairs as sarcasm information through consistency preference methods. However, these methods prioritize consistency over incongruity and blur incongruity information under their global feature aggregation mechanisms, leading to incongruity distortions and model misinterpretations. To address the above issues, this paper proposes a pioneering inconsistency preference method called incongruity-aware tension field network (ITFNet) for multi-modal sarcasm detection tasks. Specifically, ITFNet extracts effective text-image feature pairs in fact and sentiment perspectives. It then constructs a fact/sentiment tension field with discrepancy metrics to capture the contextual tone and polarized incongruity after the iterative learning of tension intensity, effectively highlighting incongruity information during such inconsistency preference learning. It further standardizes the polarized incongruity with reference to contextual tone to obtain standardized incongruity, effectively implementing instance standardization for unbiased decision-making in MSD. ITFNet performs well in extracting salient and standardized incongruity through an incongruity-aware tension field, significantly tackling incongruity distortions and cross-instance variance. Moreover, ITFNet achieves state-of-the-art performance surpassing LLaVA1.5-7B with only 17.3M trainable parameters, demonstrating its optimal performance-efficiency in multi-modal sarcasm detection tasks.
Anthology ID:
2025.acl-long.705
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
14499–14508
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.705/
DOI:
Bibkey:
Cite (ACL):
Jiecheng Zhang, C.L.Philip Chen, Shuzhen Li, and Tong Zhang. 2025. Incongruity-aware Tension Field Network for Multi-modal Sarcasm Detection. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14499–14508, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Incongruity-aware Tension Field Network for Multi-modal Sarcasm Detection (Zhang et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.705.pdf