SeqAR: Jailbreak LLMs with Sequential Auto-Generated Characters

Yan Yang; Zeguan Xiao; Xin Lu; Hongru Wang; Xuetao Wei; Hailiang Huang; Guanhua Chen; Yun Chen

SeqAR: Jailbreak LLMs with Sequential Auto-Generated Characters

Yan Yang, Zeguan Xiao, Xin Lu, Hongru Wang, Xuetao Wei, Hailiang Huang, Guanhua Chen, Yun Chen

Abstract

The widespread applications of large language models (LLMs) have brought about concerns regarding their potential misuse. Although aligned with human preference data before release, LLMs remain vulnerable to various malicious attacks. In this paper, we adopt a red-teaming strategy to enhance LLM safety and introduce SeqAR, a simple yet effective framework to design jailbreak prompts automatically. The SeqAR framework generates and optimizes multiple jailbreak characters and then applies sequential jailbreak characters in a single query to bypass the guardrails of the target LLM. Different from previous work which relies on proprietary LLMs or seed jailbreak templates crafted by human expertise, SeqAR can generate and optimize the jailbreak prompt in a cold-start scenario using open-sourced LLMs without any seed jailbreak templates. Experimental results show that SeqAR achieves attack success rates of 88% and 60% in bypassing the safety alignment of GPT-3.5-1106 and GPT-4, respectively. Furthermore, we extensively evaluate the transferability of the generated templates across different LLMs and held-out malicious requests, while also exploring defense strategies against the jailbreak attack designed by SeqAR.

Anthology ID:: 2025.naacl-long.42
Volume:: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 912–931
Language:
URL:: https://preview.aclanthology.org/Author-page-Marten-During-lu/2025.naacl-long.42/
DOI:
Bibkey:
Cite (ACL):: Yan Yang, Zeguan Xiao, Xin Lu, Hongru Wang, Xuetao Wei, Hailiang Huang, Guanhua Chen, and Yun Chen. 2025. SeqAR: Jailbreak LLMs with Sequential Auto-Generated Characters. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 912–931, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: SeqAR: Jailbreak LLMs with Sequential Auto-Generated Characters (Yang et al., NAACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/Author-page-Marten-During-lu/2025.naacl-long.42.pdf

PDF Cite Search Fix data