Safe in Isolation, Dangerous Together: Agent-Driven Multi-Turn Decomposition Jailbreaks on LLMs

Devansh Srivastav; Xiao Zhang (张晓)

doi:10.18653/v1/2025.realm-1.13

Safe in Isolation, Dangerous Together: Agent-Driven Multi-Turn Decomposition Jailbreaks on LLMs

Abstract

Large Language Models (LLMs) are increasingly deployed in critical domains, but their vulnerability to jailbreak attacks remains a significant concern. In this paper, we propose a multi-agent, multi-turn jailbreak strategy that systematically bypasses LLM safety mechanisms by decomposing harmful queries into seemingly benign sub-tasks. Built upon a role-based agentic framework consisting of a Question Decomposer, a Sub-Question Answerer, and an Answer Combiner, we demonstrate how LLMs can be manipulated to generate prohibited content without prompt manipulations. Our results show a drastic increase in attack success, often exceeding 90% across various LLMs, including GPT-3.5-Turbo, Gemma-2-9B, and Mistral-7B. We further analyze attack consistency across multiple runs and vulnerability across content categories. Compared to existing widely used jailbreak techniques, our multi-agent method consistently achieves the highest attack success rate across all evaluated models. These findings reveal a critical flaw in the current safety architecture of multi-agent LLM systems: their lack of holistic context awareness. By revealing this weakness, we argue for an urgent need to develop multi-turn, context-aware, and robust defenses to address this emerging threat vector.

Anthology ID:: 2025.realm-1.13
Volume:: Proceedings of the 1st Workshop for Research on Agent Language Models (REALM 2025)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Ehsan Kamalloo, Nicolas Gontier, Xing Han Lu, Nouha Dziri, Shikhar Murty, Alexandre Lacoste
Venues:: REALM | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 170–183
Language:
URL:: https://preview.aclanthology.org/mtsummit-25-ingestion/2025.realm-1.13/
DOI:: 10.18653/v1/2025.realm-1.13
Bibkey:
Cite (ACL):: Devansh Srivastav and Xiao Zhang. 2025. Safe in Isolation, Dangerous Together: Agent-Driven Multi-Turn Decomposition Jailbreaks on LLMs. In Proceedings of the 1st Workshop for Research on Agent Language Models (REALM 2025), pages 170–183, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Safe in Isolation, Dangerous Together: Agent-Driven Multi-Turn Decomposition Jailbreaks on LLMs (Srivastav & Zhang, REALM 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/mtsummit-25-ingestion/2025.realm-1.13.pdf

PDF Cite Search Fix data