JailbreakRadar: Comprehensive Assessment of Jailbreak Attacks Against LLMs

Junjie Chu; Yugeng Liu; Ziqing Yang; Xinyue Shen; Michael Backes; Yang Zhang

JailbreakRadar: Comprehensive Assessment of Jailbreak Attacks Against LLMs

Junjie Chu, Yugeng Liu, Ziqing Yang, Xinyue Shen, Michael Backes, Yang Zhang

Abstract

Jailbreak attacks aim to bypass the LLMs’ safeguards. While researchers have proposed different jailbreak attacks in depth, they have done so in isolation—either with unaligned settings or comparing a limited range of methods. To fill this gap, we present a large-scale evaluation of various jailbreak attacks. We collect 17 representative jailbreak attacks, summarize their features, and establish a novel jailbreak attack taxonomy. Then we conduct comprehensive measurement and ablation studies across nine aligned LLMs on 160 forbidden questions from 16 violation categories. Also, we test jailbreak attacks under eight advanced defenses. Based on our taxonomy and experiments, we identify some important patterns, such as heuristic-based attacks, which could achieve high attack success rates but are easy to mitigate by defenses. Our study offers valuable insights for future research on jailbreak attacks and defenses and serves as a benchmark tool for researchers and practitioners to evaluate them effectively.

Anthology ID:: 2025.acl-long.1045
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 21538–21566
Language:
URL:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1045/
DOI:
Bibkey:
Cite (ACL):: Junjie Chu, Yugeng Liu, Ziqing Yang, Xinyue Shen, Michael Backes, and Yang Zhang. 2025. JailbreakRadar: Comprehensive Assessment of Jailbreak Attacks Against LLMs. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 21538–21566, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: JailbreakRadar: Comprehensive Assessment of Jailbreak Attacks Against LLMs (Chu et al., ACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1045.pdf

PDF Cite Search Fix data