TrojanWave: Exploiting Prompt Learning for Stealthy Backdoor Attacks on Large Audio-Language Models

Asif Hanif, Maha Tufail Agro, Fahad Shamshad, Karthik Nandakumar


Abstract
Prompt learning has emerged as an efficient alternative to full fine-tuning for adapting large audio-language models (ALMs) to downstream tasks. While this paradigm enables scalable deployment via Prompt-as-a-Service frameworks, it also introduces a critical yet underexplored security risk of backdoor attacks. In this work, we present TrojanWave, the first backdoor attack tailored to the prompt-learning setting in frozen ALMs. Unlike prior audio backdoor methods that require training from scratch on full datasets, TrojanWave injects backdoors solely through learnable prompts, making it highly scalable and effective in few-shot settings. TrojanWave injects imperceptible audio triggers in both time and spectral domains to effectively induce targeted misclassification during inference. To mitigate this threat, we further propose TrojanWave-Defense, a lightweight prompt purification method that neutralizes malicious prompts without hampering the clean performance. Extensive experiments across 11 diverse audio classification benchmarks demonstrate the robustness and practicality of both the attack and defense. Our code is publicly available at https://asif-hanif.github.io/trojanwave/.
Anthology ID:
2025.emnlp-main.940
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
18628–18644
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.940/
DOI:
Bibkey:
Cite (ACL):
Asif Hanif, Maha Tufail Agro, Fahad Shamshad, and Karthik Nandakumar. 2025. TrojanWave: Exploiting Prompt Learning for Stealthy Backdoor Attacks on Large Audio-Language Models. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 18628–18644, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
TrojanWave: Exploiting Prompt Learning for Stealthy Backdoor Attacks on Large Audio-Language Models (Hanif et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.940.pdf
Checklist:
 2025.emnlp-main.940.checklist.pdf