Target-Augmented Shared Fusion-based Multimodal Sarcasm Explanation Generation

Palaash Goel, Dushyant Singh Chauhan, Md Shad Akhtar


Abstract
Sarcasm is a linguistic phenomenon that intends to ridicule a target (e.g., entity, event, or person) in an inherent way. Multimodal Sarcasm Explanation (MuSE) aims at revealing the intended irony in a sarcastic post using a natural language explanation. Though important, existing systems overlooked the significance of the target of sarcasm in generating explanations. In this paper, we propose a Target-aUgmented shaRed fusion-Based sarcasm explanatiOn model, aka. . We design a novel shared-fusion mechanism to leverage the inter-modality relationships between an image and its caption. assumes the target of the sarcasm and guides the multimodal shared fusion mechanism in learning intricacies of the intended irony for explanations. We evaluate our proposed model on the dataset. Comparison against multiple baselines and state-of-the-art models signifies the performance improvement of by an average margin of +3.3%. Moreover, we explore LLMs in zero and one-shot settings for our task and observe that LLM-generated explanation, though remarkable, often fails to capture the critical nuances of the sarcasm. Furthermore, we supplement our study with extensive human evaluation on ‘s generated explanations and find them out to be comparatively better than other systems.
Anthology ID:
2025.findings-naacl.472
Volume:
Findings of the Association for Computational Linguistics: NAACL 2025
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8480–8493
Language:
URL:
https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.findings-naacl.472/
DOI:
Bibkey:
Cite (ACL):
Palaash Goel, Dushyant Singh Chauhan, and Md Shad Akhtar. 2025. Target-Augmented Shared Fusion-based Multimodal Sarcasm Explanation Generation. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 8480–8493, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
Target-Augmented Shared Fusion-based Multimodal Sarcasm Explanation Generation (Goel et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.findings-naacl.472.pdf