Defending LLMs against Jailbreak Attacks via Template-Based ICL with a Defensive Suffix

Ruiyang Ni; Changlong Li; Shuaibiao Han; Zhiyu Yi; Perley Xu; Wenjie Ruan

Defending LLMs against Jailbreak Attacks via Template-Based ICL with a Defensive Suffix

Ruiyang Ni, Changlong Li, Shuaibiao Han, Zhiyu Yi, Perley Xu, Wenjie Ruan

Abstract

State-of-the-art large language models (LLMs) have achieved impressive results on various tasks. However, these architectures are vulnerable to jailbreak attacks, such as GCG and AutoDAN. Several defense strategies have been proposed to protect LLMs from generating harmful content, with most methods focusing on model fine-tuning or heuristic defense designs. These methods are often time-consuming or less effective. To fill this gap, this paper proposes a novel defense solution by taking the advances of online In-Context Learning (ICL) and an offline defensive suffix. Specifically, we first optimize the offline defensive suffix using an iterative algorithm. Second, an online stochastic random search is conducted to identify the most effective ICL demonstrations. Finally, the original user instruction, the selected ICL demonstrations, and the defensive suffix are assembled into a structured input prompt using a carefully designed template, which is then fed into the LLM for response generation. Experimental results show that our method is effective against both advanced white-box and black-box attacks, reducing the attack success rate to nearly *0%*, while maintaining the model’s utility on the benign tasks and incurring only *negligible* computational overhead. Our code is available on https://github.com/Trusted-LLM/DSICL.

Anthology ID:: 2026.findings-acl.2123
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 42778–42797
Language:
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.findings-acl.2123/
DOI:
Bibkey:
Cite (ACL):: Ruiyang Ni, Changlong Li, Shuaibiao Han, Zhiyu Yi, Perley Xu, and Wenjie Ruan. 2026. Defending LLMs against Jailbreak Attacks via Template-Based ICL with a Defensive Suffix. In Findings of the Association for Computational Linguistics: ACL 2026, pages 42778–42797, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Defending LLMs against Jailbreak Attacks via Template-Based ICL with a Defensive Suffix (Ni et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.findings-acl.2123.pdf
Checklist:: 2026.findings-acl.2123.checklist.pdf

PDF Cite Search Checklist Fix data