Geyuan Zhang

2025

pdf bib abs
Jailbreak Large Vision-Language Models Through Multi-Modal Linkage
Yu Wang | Xiaofei Zhou | Yichen Wang | Geyuan Zhang | Tianxing He
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

With the rapid advancement of Large Vision-Language Models (VLMs), concerns about their ‌potential misuse and abuse have grown rapidly. Prior research has exposed VLMs’ vulnerability to jailbreak attacks, where carefully crafted inputs can lead the model to produce content that violates ethical and legal standards. However, current jailbreak methods often fail against cutting-edge models such as GPT-4o. We attribute this to the over-exposure of harmful content and the absence of stealthy malicious guidance. In this work, we introduce a novel jailbreak framework: Multi-Modal Linkage (MML) Attack. Drawing inspiration from cryptography, MML employs an encryption-decryption process across text and image modalities to mitigate the over-exposure of malicious information. To covertly align the model’s output with harmful objectives, MML leverages a technique we term evil alignment, framing the attack within the narrative context of a video game development scenario. Extensive experiments validate the effectiveness of MML. Specifically, MML jailbreaks GPT-4o with attack success rates of 99.40% on SafeBench, 98.81% on MM-SafeBench, and 99.07% on HADES-Dataset. Our code is available at https://github.com/wangyu-ovo/MML.

Co-authors

Venues

acl1

Fix author