Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification

Boyang Zhang; Yicong Tan; Yun Shen; Ahmed Salem; Michael Backes; Savvas Zannettou; Yang Zhang

Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification

Boyang Zhang, Yicong Tan, Yun Shen, Ahmed Salem, Michael Backes, Savvas Zannettou, Yang Zhang

Abstract

Recently, autonomous agents built on large language models (LLMs) have experienced significant development and are being deployed in real-world applications. Through the usage of tools, these systems can perform actions in the real world. Given the agents’ practical applications and ability to execute consequential actions, such autonomous systems can cause more severe damage than a standalone LLM if compromised. While some existing research has explored harmful actions by LLM agents, our study approaches the vulnerability from a different perspective. We introduce a new type of attack that causes malfunctions by misleading the agent into executing repetitive or irrelevant actions. Our experiments reveal that these attacks can induce failure rates exceeding 80% in multiple scenarios. Through attacks on implemented and deployable agents in multi-agent scenarios, we accentuate the realistic risks associated with these vulnerabilities. To mitigate such attacks, we propose self-examination defense methods. Our findings indicate these attacks are more difficult to detect compared to previous overtly harmful attacks, highlighting the substantial risks associated with this vulnerability.

Anthology ID:: 2025.emnlp-main.1771
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 34952–34964
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1771/
DOI:
Bibkey:
Cite (ACL):: Boyang Zhang, Yicong Tan, Yun Shen, Ahmed Salem, Michael Backes, Savvas Zannettou, and Yang Zhang. 2025. Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 34952–34964, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification (Zhang et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1771.pdf
Checklist:: 2025.emnlp-main.1771.checklist.pdf

PDF Cite Search Checklist Fix data