When Allies Turn Foes: Exploring Group Characteristics of LLM-Based Multi-Agent Collaborative Systems Under Adversarial Attacks
Jiahao Zhang, Baoshuo Kan, Tao Gong, Fu Lee Wang, Tianyong Hao
Abstract
This paper investigates the group characteristics in multi-agent collaborative systems under adversarial attacks. Adversarial agents are tasked with generating counterfactual answers to a given collaborative problem, while collaborative agents normally interact with other agents to solve the given problem. To simulate real-world collaboration scenarios as closely as possible, we evaluate the collaborative system in three different collaboration scenarios and design three different communication strategies and different group structures. Furthermore, we explored several methods to mitigate adversarial attacks, all of which have been proven effective through our experiments. To quantify the robustness of collaborative systems against such attacks, a novel metric, System Defense Index (SDI), is introduced. Finally, we conducted an in-depth analysis from the perspective of group dynamics on how adversarial agents affect multi-agent collaborative systems, which reveals similarities between the agent collaboration process and human collaboration process. The code will be made available after publication.- Anthology ID:
- 2025.findings-emnlp.333
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2025
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 6275–6300
- Language:
- URL:
- https://preview.aclanthology.org/ingest-luhme/2025.findings-emnlp.333/
- DOI:
- 10.18653/v1/2025.findings-emnlp.333
- Cite (ACL):
- Jiahao Zhang, Baoshuo Kan, Tao Gong, Fu Lee Wang, and Tianyong Hao. 2025. When Allies Turn Foes: Exploring Group Characteristics of LLM-Based Multi-Agent Collaborative Systems Under Adversarial Attacks. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 6275–6300, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- When Allies Turn Foes: Exploring Group Characteristics of LLM-Based Multi-Agent Collaborative Systems Under Adversarial Attacks (Zhang et al., Findings 2025)
- PDF:
- https://preview.aclanthology.org/ingest-luhme/2025.findings-emnlp.333.pdf