Hard to Read, Easy to Jailbreak: How Visual Degradation Bypasses MLLM Safety Alignment

Zhixue Song; Boyan Han; Yiwei Wang; Chi Zhang

Hard to Read, Easy to Jailbreak: How Visual Degradation Bypasses MLLM Safety Alignment

Zhixue Song, Boyan Han, Yiwei Wang, Chi Zhang

Abstract

Recent advancements in visual context compression enable MLLMs to process ultra-long contexts efficiently by rendering text into images. However, we identify a critical vulnerability inherent to this paradigm: lowering image resolution inadvertently catalyzes jailbreaking. Our experiments reveal that the safety defenses of SOTA models deteriorate sharply as resolution degrades, surprisingly persisting even when text remains legible. We attribute this to “Cognitive Overload“, hypothesizing that the effort required to decipher degraded inputs diverts attentional resources from safety auditing. This phenomenon is consistent across various visual perturbations, including noise and geometric distortion. To address this, we propose a simple “Structured Cognitive Offloading” strategy that mitigates these risks by enforcing a serialized pipeline to decouple visual transcription from safety assessment. Our work exposes a significant risk in vision-based compression and provides critical insights for the secure design of future MLLMs.

Anthology ID:: 2026.findings-acl.983
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 19643–19658
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.983/
DOI:
Bibkey:
Cite (ACL):: Zhixue Song, Boyan Han, Yiwei Wang, and Chi Zhang. 2026. Hard to Read, Easy to Jailbreak: How Visual Degradation Bypasses MLLM Safety Alignment. In Findings of the Association for Computational Linguistics: ACL 2026, pages 19643–19658, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Hard to Read, Easy to Jailbreak: How Visual Degradation Bypasses MLLM Safety Alignment (Song et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.983.pdf
Checklist:: 2026.findings-acl.983.checklist.pdf

PDF Cite Search Checklist Fix data