Defending LLMs against Jailbreak Attacks via Template-Based ICL with a Defensive Suffix
Ruiyang Ni, Changlong Li, Shuaibiao Han, Zhiyu Yi, Perley Xu, Wenjie Ruan
Abstract
State-of-the-art large language models (LLMs) have achieved impressive results on various tasks. However, these architectures are vulnerable to jailbreak attacks, such as GCG and AutoDAN. Several defense strategies have been proposed to protect LLMs from generating harmful content, with most methods focusing on model fine-tuning or heuristic defense designs. These methods are often time-consuming or less effective. To fill this gap, this paper proposes a novel defense solution by taking the advances of online In-Context Learning (ICL) and an offline defensive suffix. Specifically, we first optimize the offline defensive suffix using an iterative algorithm. Second, an online stochastic random search is conducted to identify the most effective ICL demonstrations. Finally, the original user instruction, the selected ICL demonstrations, and the defensive suffix are assembled into a structured input prompt using a carefully designed template, which is then fed into the LLM for response generation. Experimental results show that our method is effective against both advanced white-box and black-box attacks, reducing the attack success rate to nearly *0%*, while maintaining the model’s utility on the benign tasks and incurring only *negligible* computational overhead. Our code is available on https://github.com/Trusted-LLM/DSICL.- Anthology ID:
- 2026.findings-acl.2123
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 42778–42797
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.findings-acl.2123/
- DOI:
- Cite (ACL):
- Ruiyang Ni, Changlong Li, Shuaibiao Han, Zhiyu Yi, Perley Xu, and Wenjie Ruan. 2026. Defending LLMs against Jailbreak Attacks via Template-Based ICL with a Defensive Suffix. In Findings of the Association for Computational Linguistics: ACL 2026, pages 42778–42797, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- Defending LLMs against Jailbreak Attacks via Template-Based ICL with a Defensive Suffix (Ni et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.findings-acl.2123.pdf