Safe in Isolation, Dangerous Together: Agent-Driven Multi-Turn Decomposition Jailbreaks on LLMs

Devansh Srivastav, Xiao Zhang


Abstract
Large Language Models (LLMs) are increasingly deployed in critical domains, but their vulnerability to jailbreak attacks remains a significant concern. In this paper, we propose a multi-agent, multi-turn jailbreak strategy that systematically bypasses LLM safety mechanisms by decomposing harmful queries into seemingly benign sub-tasks. Built upon a role-based agentic framework consisting of a Question Decomposer, a Sub-Question Answerer, and an Answer Combiner, we demonstrate how LLMs can be manipulated to generate prohibited content without prompt manipulations. Our results show a drastic increase in attack success, often exceeding 90% across various LLMs, including GPT-3.5-Turbo, Gemma-2-9B, and Mistral-7B. We further analyze attack consistency across multiple runs and vulnerability across content categories. Compared to existing widely used jailbreak techniques, our multi-agent method consistently achieves the highest attack success rate across all evaluated models. These findings reveal a critical flaw in the current safety architecture of multi-agent LLM systems: their lack of holistic context awareness. By revealing this weakness, we argue for an urgent need to develop multi-turn, context-aware, and robust defenses to address this emerging threat vector.
Anthology ID:
2025.realm-1.13
Volume:
Proceedings of the 1st Workshop for Research on Agent Language Models (REALM 2025)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Ehsan Kamalloo, Nicolas Gontier, Xing Han Lu, Nouha Dziri, Shikhar Murty, Alexandre Lacoste
Venues:
REALM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
170–183
Language:
URL:
https://preview.aclanthology.org/display_plenaries/2025.realm-1.13/
DOI:
Bibkey:
Cite (ACL):
Devansh Srivastav and Xiao Zhang. 2025. Safe in Isolation, Dangerous Together: Agent-Driven Multi-Turn Decomposition Jailbreaks on LLMs. In Proceedings of the 1st Workshop for Research on Agent Language Models (REALM 2025), pages 170–183, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Safe in Isolation, Dangerous Together: Agent-Driven Multi-Turn Decomposition Jailbreaks on LLMs (Srivastav & Zhang, REALM 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/display_plenaries/2025.realm-1.13.pdf