Jailbreak Large Vision-Language Models Through Multi-Modal Linkage

Yu Wang (王昱); Xiaofei Zhou; Yichen Wang; Geyuan Zhang; Tianxing He

Jailbreak Large Vision-Language Models Through Multi-Modal Linkage

Yu Wang, Xiaofei Zhou, Yichen Wang, Geyuan Zhang, Tianxing He

Abstract

With the rapid advancement of Large Vision-Language Models (VLMs), concerns about their ‌potential misuse and abuse have grown rapidly. Prior research has exposed VLMs’ vulnerability to jailbreak attacks, where carefully crafted inputs can lead the model to produce content that violates ethical and legal standards. However, current jailbreak methods often fail against cutting-edge models such as GPT-4o. We attribute this to the over-exposure of harmful content and the absence of stealthy malicious guidance. In this work, we introduce a novel jailbreak framework: Multi-Modal Linkage (MML) Attack. Drawing inspiration from cryptography, MML employs an encryption-decryption process across text and image modalities to mitigate the over-exposure of malicious information. To covertly align the model’s output with harmful objectives, MML leverages a technique we term evil alignment, framing the attack within the narrative context of a video game development scenario. Extensive experiments validate the effectiveness of MML. Specifically, MML jailbreaks GPT-4o with attack success rates of 99.40% on SafeBench, 98.81% on MM-SafeBench, and 99.07% on HADES-Dataset. Our code is available at https://github.com/wangyu-ovo/MML.

Anthology ID:: 2025.acl-long.74
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1466–1494
Language:
URL:: https://preview.aclanthology.org/landing_page/2025.acl-long.74/
DOI:
Bibkey:
Cite (ACL):: Yu Wang, Xiaofei Zhou, Yichen Wang, Geyuan Zhang, and Tianxing He. 2025. Jailbreak Large Vision-Language Models Through Multi-Modal Linkage. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1466–1494, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Jailbreak Large Vision-Language Models Through Multi-Modal Linkage (Wang et al., ACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2025.acl-long.74.pdf

PDF Cite Search Fix data