T-MAD: Target-driven Multimodal Alignment for Stance Detection

ZhaoDan Zhang; Jin Zhang (张瑾); Xueqi Cheng (程学旗); Hui Xu

T-MAD: Target-driven Multimodal Alignment for Stance Detection

ZhaoDan Zhang, Jin Zhang, Xueqi Cheng, Hui Xu

Abstract

Multimodal Stance Detection (MSD) aims to determine a user’s stance - support, oppose, or neutral - toward a target by analyzing multimodal content such as texts and images from social media. Existing MSD methods struggle with generalizing to unseen targets and handling modality inconsistencies. To address these challenges, we propose the Target-driven Multi-modal Alignment and Dynamic Weighting Model (T-MAD), which combines target-driven multi-modal alignment and dynamic weighting mechanisms to capture target-specific relationships and balance modality contributions. The model incorporates iterative reasoning to iteratively refine predictions, achieving robust performance in both in-target and zero-shot settings. Experiments on the MMSD and MultiClimate datasets show that T-MAD outperforms state-of-the-art models, with optimal results achieved using RoBERTa, ViT, and an iterative depth of 5. Ablation studies further confirm the importance of multi-modal alignment and dynamic weighting in enhancing model effectiveness.

Anthology ID:: 2025.emnlp-main.30
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 580–595
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.30/
DOI:
Bibkey:
Cite (ACL):: ZhaoDan Zhang, Jin Zhang, Xueqi Cheng, and Hui Xu. 2025. T-MAD: Target-driven Multimodal Alignment for Stance Detection. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 580–595, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: T-MAD: Target-driven Multimodal Alignment for Stance Detection (Zhang et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.30.pdf
Checklist:: 2025.emnlp-main.30.checklist.pdf

PDF Cite Search Checklist Fix data