VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models

Bingrui Sima, Linhua Cong, Wenxuan Wang, Kun He


Abstract
The emergence of Multimodal Large Reasoning Models (MLRMs) has enabled sophisticated visual reasoning capabilities by integrating reinforcement learning and Chain-of-Thought (CoT) supervision. However, while these enhanced reasoning capabilities improve performance, they also introduce new and underexplored safety risks. In this work, we systematically investigate the security implications of advanced visual reasoning in MLRMs. Our analysis reveals a fundamental trade-off: as visual reasoning improves, models become more vulnerable to jailbreak attacks. Motivated by this critical finding, we introduce VisCRA (Visual Chain Reasoning Attack), a novel jailbreak framework that exploits the visual reasoning chains to bypass safety mechanisms. VisCRA combines targeted visual attention masking with a two-stage reasoning induction strategy to precisely control harmful outputs. Extensive experiments demonstrate VisCRA’s significant effectiveness, achieving high attack success rates on leading closed-source MLRMs: 76.48% on Gemini 2.0 Flash Thinking, 68.56% on QvQ-Max, and 56.60% on GPT-4o. Our findings highlight a critical insight: the very capability that empowers MLRMs — their visual reasoning — can also serve as an attack vector, posing significant security risks. Warning: This paper contains unsafe examples.
Anthology ID:
2025.emnlp-main.312
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6142–6155
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.312/
DOI:
Bibkey:
Cite (ACL):
Bingrui Sima, Linhua Cong, Wenxuan Wang, and Kun He. 2025. VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 6142–6155, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models (Sima et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.312.pdf
Checklist:
 2025.emnlp-main.312.checklist.pdf