DemonAgent: Dynamically Encrypted Multi-Backdoor Implantation Attack on LLM-based Agent

Pengyu Zhu; Zhenhong Zhou; Yuanhe Zhang; Shilinlu Yan; Kun Wang; Sen Su

doi:10.18653/v1/2025.findings-emnlp.157

DemonAgent: Dynamically Encrypted Multi-Backdoor Implantation Attack on LLM-based Agent

Pengyu Zhu, Zhenhong Zhou, Yuanhe Zhang, Shilinlu Yan, Kun Wang, Sen Su

Abstract

As LLM-based agents become increasingly prevalent, triggers implanted in user queries or environment feedback can activate hidden backdoors, raising critical concerns about safety vulnerabilities in agents.However, traditional backdoor attacks are often detectable by safety audits that analyze the reasoning process of agents, hindering further progress in agent safety research.To this end, we propose a novel backdoor implantation strategy called Dynamically Encrypted Multi-Backdoor Implantation Attack. Specifically, we introduce dynamic encryption, which maps the backdoor into benign content, effectively circumventing safety audits.To enhance stealthiness, we further decompose the backdoor into multiple sub-backdoor fragments. Based on these advancements, backdoors are allowed to bypass safety audits significantly.Additionally, we present AgentBackdoorEval, a dataset designed for the comprehensive evaluation of agent backdoor attacks.Experimental results across multiple datasets demonstrate that our method achieves an attack success rate approaching 100% while maintaining a detection rate of 0%, illustrating its effectiveness in evading safety audits.Our findings highlight the limitations of existing safety mechanisms in detecting advanced attacks, underscoring the urgent need for more robust defenses against backdoor threats.Code and data are available at https://github.com/whfeLingYu/DemonAgent.

Anthology ID:: 2025.findings-emnlp.157
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2890–2912
Language:
URL:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.157/
DOI:: 10.18653/v1/2025.findings-emnlp.157
Bibkey:
Cite (ACL):: Pengyu Zhu, Zhenhong Zhou, Yuanhe Zhang, Shilinlu Yan, Kun Wang, and Sen Su. 2025. DemonAgent: Dynamically Encrypted Multi-Backdoor Implantation Attack on LLM-based Agent. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 2890–2912, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: DemonAgent: Dynamically Encrypted Multi-Backdoor Implantation Attack on LLM-based Agent (Zhu et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.157.pdf
Checklist:: 2025.findings-emnlp.157.checklist.pdf

PDF Cite Search Checklist Fix data