Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models

Zirui Song; Qian Jiang; Mingxuan Cui; Mingzhe Li; Lang Gao; Zeyu Zhang; Zixiang Xu; Yanbo Wang; Guangxian Ouyang; Zhenhao Chen; Xiuying Chen

Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models

Zirui Song, Qian Jiang, Mingxuan Cui, Mingzhe Li, Lang Gao, Zeyu Zhang, Zixiang Xu, Yanbo Wang, Guangxian Ouyang, Zhenhao Chen, Xiuying Chen

Abstract

The rise of Large Audio-Language Models (LAMs) brings both potential and risks, as their audio outputs may contain harmful or unethical content. However, current research lacks a systematic, quantitative evaluation of LAM safety, especially against jailbreak attacks, which are challenging due to the temporal and semantic nature of speech. To bridge this gap, we introduce AJailBench, the first benchmark specifically designed to evaluate jailbreak vulnerabilities in LAMs. We begin by constructing -Base, a dataset of 1,495 adversarial audio prompts spanning 10 policy-violating categories. Using this dataset, we evaluate several state-of-the-art LAMs and reveal that none exhibit consistent robustness across attacks. To further strengthen jailbreak testing and simulate more realistic attack conditions, we propose a method to generate dynamic adversarial variants. Our Audio Perturbation Toolkit (APT) applies targeted distortions across time, frequency, and amplitude domains. To preserve the original jailbreak intent, we enforce a semantic consistency constraint and employ Bayesian optimization to efficiently search for perturbations that are both subtle and highly effective. This results in AJailBench-APT+, an extended dataset of optimized adversarial audio samples. Our findings demonstrate that even small, semantically preserved perturbations can significantly reduce the safety performance of leading LAMs, underscoring the need for more robust and semantically aware defense mechanisms. We release AJailBench to facilitate future research: https://anonymous.4open.science/r/AudioJailbreak-4262/

Anthology ID:: 2026.acl-long.1259
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 27294–27308
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1259/
DOI:
Bibkey:
Cite (ACL):: Zirui Song, Qian Jiang, Mingxuan Cui, Mingzhe Li, Lang Gao, Zeyu Zhang, Zixiang Xu, Yanbo Wang, Guangxian Ouyang, Zhenhao Chen, and Xiuying Chen. 2026. Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 27294–27308, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models (Song et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1259.pdf
Checklist:: 2026.acl-long.1259.checklist.pdf

PDF Cite Search Checklist Fix data