ECHA: Jailbreaking LVLMs via the Mismatch between Implicit Semantic Reconstruction and Explicit Safety Alignment

Chenxing Xu, Junyong Jiang, Zehu Zhang, Lu Dong


Abstract
Large Visual Language Models (LVLMs) achieve superior multimodal reasoning but inevitably expand the safety attack surface. While recent studies have explored emoji-based vulnerabilities, they predominantly focus on textual tokenization artifacts and neglect the model’s intrinsic capability to interpret visual semantics. In this paper, we reveal a critical systemic vulnerability termed the Mismatch between Implicit Semantic Reconstruction and Explicit Safety Alignment. We observe that LVLMs can implicitly synthesize holistic malicious semantics from fragmented visual cues, whereas existing guardrails fail to intercept such latent intent. To exploit this, we propose the Emoji Chain Hinting Attack (ECHA), a visual typography framework that decouples sensitive concepts into semantically related emoji chains and structural text masks. By utilizing benign scenario-based prompts to guide the decoding process, ECHA induces the model to internally reconstruct prohibited intent from abstract visual symbols, effectively bypassing surface-level safety detection. We conduct extensive red-teaming evaluations on seven state-of-the-art (SOTA) LVLMs, comprising proprietary systems such as GPT-4.1-Nano, GPT-4o-Mini, and Gemini-2.5-Flash, alongside open-source models including Qwen2.5-VL, Qwen3-VL, InternVL-3.5, and LLaVA-NeXT. Experimental results demonstrate that ECHA significantly outperforms existing baselines, successfully bypassing safety guardrails in over 81% of instances with a single attempt. Our code is available at https://github.com/KerryZack/ECHA.
Anthology ID:
2026.findings-acl.893
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
17974–17990
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.893/
DOI:
Bibkey:
Cite (ACL):
Chenxing Xu, Junyong Jiang, Zehu Zhang, and Lu Dong. 2026. ECHA: Jailbreaking LVLMs via the Mismatch between Implicit Semantic Reconstruction and Explicit Safety Alignment. In Findings of the Association for Computational Linguistics: ACL 2026, pages 17974–17990, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
ECHA: Jailbreaking LVLMs via the Mismatch between Implicit Semantic Reconstruction and Explicit Safety Alignment (Xu et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.893.pdf
Checklist:
 2026.findings-acl.893.checklist.pdf