TRUST-VL: An Explainable News Assistant for General Multimodal Misinformation Detection

Zehong Yan, Peng Qi, Wynne Hsu, Mong-Li Lee


Abstract
Multimodal misinformation, encompassing textual, visual, and cross-modal distortions, poses an increasing societal threat that is amplified by generative AI. Existing methods typically focus on a single type of distortion and struggle to generalize to unseen scenarios. In this work, we observe that different distortion types share common reasoning capabilities while also requiring task-specific skills. We hypothesize that joint training across distortion types facilitates knowledge sharing and enhances the model’s ability to generalize. To this end, we introduce TRUST-VL, a unified and explainable vision-language model for general multimodal misinformation detection. TRUST-VL incorporates a novel Question-Aware Visual Amplifier module, designed to extract task-specific visual features. To support training, we also construct TRUST-Instruct, a large-scale instruction dataset containing 198K samples featuring structured reasoning chains aligned with human fact-checking workflows. Extensive experiments on both in-domain and zero-shot benchmarks demonstrate that TRUST-VL achieves state-of-the-art performance, while also offering strong generalization and interpretability.
Anthology ID:
2025.emnlp-main.284
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5588–5604
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.284/
DOI:
Bibkey:
Cite (ACL):
Zehong Yan, Peng Qi, Wynne Hsu, and Mong-Li Lee. 2025. TRUST-VL: An Explainable News Assistant for General Multimodal Misinformation Detection. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 5588–5604, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
TRUST-VL: An Explainable News Assistant for General Multimodal Misinformation Detection (Yan et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.284.pdf
Checklist:
 2025.emnlp-main.284.checklist.pdf