SemanticCamo: Jailbreaking Large Language Models through Semantic Camouflage

Jihui Yan; Xiaocui Yang; Daling Wang; Shi Feng; Yifei Zhang; Yinzhi Zhao

SemanticCamo: Jailbreaking Large Language Models through Semantic Camouflage

Jihui Yan, Xiaocui Yang, Daling Wang, Shi Feng, Yifei Zhang, Yinzhi Zhao

Abstract

The rapid development and increasingly widespread applications of Large Language Models (LLMs) have made the safety issues of LLMs more prominent and critical. Although safety training is widely used in LLMs, the mismatch between pre-training and safety training still leads to safety vulnerabilities. To expose the safety vulnerabilities in LLMs and improve LLMs’ performance in safety, we propose a novel framework, SemanticCamo, which attacks LLMs through semantic camouflage.SemanticCamo bypasses safety guardrails by replacing the original unsafe content with semantic features, thereby concealing malicious intent while keeping the query’s objectives unchanged. We conduct comprehensive experiments on the state-of-the-art LLMs, including GPT-4o and Claude-3.5, finding that SemanticCamo successfully induces harmful responses from the target models in over 80% of cases on average, outperforming previous counterparts. Additionally, the performance of SemanticCamo against various defenses is evaluated, demonstrating that semantic transformations introduce critical challenges to LLM safety, necessitating targeted alignment strategies to address this vulnerability. Code and data are available at https://github.com/Jihui-Yan/SemanticCamo.

Anthology ID:: 2025.findings-acl.745
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 14427–14452
Language:
URL:: https://preview.aclanthology.org/display_plenaries/2025.findings-acl.745/
DOI:
Bibkey:
Cite (ACL):: Jihui Yan, Xiaocui Yang, Daling Wang, Shi Feng, Yifei Zhang, and Yinzhi Zhao. 2025. SemanticCamo: Jailbreaking Large Language Models through Semantic Camouflage. In Findings of the Association for Computational Linguistics: ACL 2025, pages 14427–14452, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: SemanticCamo: Jailbreaking Large Language Models through Semantic Camouflage (Yan et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/display_plenaries/2025.findings-acl.745.pdf

PDF Cite Search Fix data