Reference Attack: A New Cross-Modal Jailbreaking Attack against Multimodal Large Language Models

Yulong Wang, Yifei Fu, Jiayi Gao


Abstract
Red team testing, an effective proactive method for evaluating the security of multimodal large language models (MLLMs), requires an expanding toolkit alongside the development of MLLM safeguards. We propose the Reference Attack, a powerful tool for red team testing against MLLMs. The Reference Attack is a reference-guided cross-modal jailbreak method that enhances existing prompt-to-image injection attacks by exploiting MLLMs’ semantic reconstruction capabilities. Our method embeds malicious prompts in non-text modalities (e.g., images, spreadsheets) and constructs recursive symbolic references in text, enabling MLLMs to gradually recover and generate harmful content through layered reference resolution.The attack introduces a new vector that circumvents conventional content moderation by exploiting MLLMs’ lack of security checks during cross-modal reference resolution. We evaluate the Reference Attack on leading MLLMs, including ChatGPT, Gemini, Claude, and the widely used open-source LLaMA model, and achieved an attack success rate of over 93% across all tested models. Compared to state-of-the-art attacks, Reference Attack achieves higher success rates than all baselines under identical evaluation, with a maximum gain of 70.8%. Our study reveals a critical gap in MLLM security and highlights the need for strict security auditing of cross-modal interactions in future content moderation.
Anthology ID:
2026.acl-long.812
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
17860–17881
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.812/
DOI:
Bibkey:
Cite (ACL):
Yulong Wang, Yifei Fu, and Jiayi Gao. 2026. Reference Attack: A New Cross-Modal Jailbreaking Attack against Multimodal Large Language Models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 17860–17881, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Reference Attack: A New Cross-Modal Jailbreaking Attack against Multimodal Large Language Models (Wang et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.812.pdf
Checklist:
 2026.acl-long.812.checklist.pdf