Zehu Zhang

2026

ECHA: Jailbreaking LVLMs via the Mismatch between Implicit Semantic Reconstruction and Explicit Safety Alignment
Chenxing Xu | Junyong Jiang | Zehu Zhang | Lu Dong
Findings of the Association for Computational Linguistics: ACL 2026

Large Visual Language Models (LVLMs) achieve superior multimodal reasoning but inevitably expand the safety attack surface. While recent studies have explored emoji-based vulnerabilities, they predominantly focus on textual tokenization artifacts and neglect the model’s intrinsic capability to interpret visual semantics. In this paper, we reveal a critical systemic vulnerability termed the Mismatch between Implicit Semantic Reconstruction and Explicit Safety Alignment. We observe that LVLMs can implicitly synthesize holistic malicious semantics from fragmented visual cues, whereas existing guardrails fail to intercept such latent intent. To exploit this, we propose the Emoji Chain Hinting Attack (ECHA), a visual typography framework that decouples sensitive concepts into semantically related emoji chains and structural text masks. By utilizing benign scenario-based prompts to guide the decoding process, ECHA induces the model to internally reconstruct prohibited intent from abstract visual symbols, effectively bypassing surface-level safety detection. We conduct extensive red-teaming evaluations on seven state-of-the-art (SOTA) LVLMs, comprising proprietary systems such as GPT-4.1-Nano, GPT-4o-Mini, and Gemini-2.5-Flash, alongside open-source models including Qwen2.5-VL, Qwen3-VL, InternVL-3.5, and LLaVA-NeXT. Experimental results demonstrate that ECHA significantly outperforms existing baselines, successfully bypassing safety guardrails in over 81% of instances with a single attempt. Our code is available at https://github.com/KerryZack/ECHA.

Co-authors

Venues

Findings1

Fix author