Shadow-Activated Backdoor Attacks on Multimodal Large Language Models

Ziyi Yin; Muchao Ye; Yuanpu Cao; Jiaqi Wang; Aofei Chang; Han Liu; Jinghui Chen; Ting Wang; Fenglong Ma

Shadow-Activated Backdoor Attacks on Multimodal Large Language Models

Ziyi Yin, Muchao Ye, Yuanpu Cao, Jiaqi Wang, Aofei Chang, Han Liu, Jinghui Chen, Ting Wang, Fenglong Ma

Abstract

This paper delves into a novel backdoor attack scenario, aiming to uncover potential security risks associated with Multimodal Large Language Models (MLLMs) during multi-round open-ended conversations with users. In the practical use of MLLMs, users have full control over the interaction process with the model, such as using their own collected photos and posing arbitrary open-ended questions. Traditional backdoor attacks that rely on adding external triggers are less applicable. To this end, we introduce a new shadow-activated backdoor attacking paradigm in this paper, wherein attacks implicitly inject malicious content into the responses of MLLMs when the responses explicitly relate to the shadowed object, i.e., without any triggers. To facilitate the shadow-activated backdoor attack, we present a novel framework named BadMLLM to achieve the desired behaviors by constructing a poisoned dataset using GPT-4 Vision and implementing an attention-regularized tuning strategy to address the semantic discontinuity between the original response and the inserted promotion. Extensive experimental results conducted on five MLLMs, three objects, and two types of promotion slogans have demonstrated impressive performance in achieving both efficacy and utility goals, thereby highlighting the significant potential risks concealed within MLLMs.

Anthology ID:: 2025.findings-acl.248
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venues:: Findings | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4808–4829
Language:
URL:: https://preview.aclanthology.org/acl25-workshop-ingestion/2025.findings-acl.248/
DOI:
Bibkey:
Cite (ACL):: Ziyi Yin, Muchao Ye, Yuanpu Cao, Jiaqi Wang, Aofei Chang, Han Liu, Jinghui Chen, Ting Wang, and Fenglong Ma. 2025. Shadow-Activated Backdoor Attacks on Multimodal Large Language Models. In Findings of the Association for Computational Linguistics: ACL 2025, pages 4808–4829, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Shadow-Activated Backdoor Attacks on Multimodal Large Language Models (Yin et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/acl25-workshop-ingestion/2025.findings-acl.248.pdf

PDF Cite Search Fix data