NEXUS: Network Exploration for eXploiting Unsafe Sequences in Multi-Turn LLM Jailbreaks
Javad Rafiei Asl, Sidhant Narula, Mohammad Ghasemigol, Eduardo Blanco, Daniel Takabi
Abstract
Large Language Models (LLMs) have revolutionized natural language processing, yet remain vulnerable to jailbreak attacks—particularly multi-turn jailbreaks that distribute malicious intent across benign exchanges, thereby bypassing alignment mechanisms. Existing approaches often suffer from limited exploration of the adversarial space, rely on hand-crafted heuristics, or lack systematic query refinement. We propose NEXUS (Network Exploration for eXploiting Unsafe Sequences), a modular framework for constructing, refining, and executing optimized multi-turn attacks. NEXUS comprises: (1) ThoughtNet, which hierarchically expands a harmful intent into a structured semantic network of topics, entities, and query chains; (2) a feedback-driven Simulator that iteratively refines and prunes these chains through attacker–victim–judge LLM collaboration using harmfulness and semantic-similarity benchmarks; and (3) a Network Traverser that adaptively navigates the refined query space for real-time attacks. This pipeline systematically uncovers stealthy, high-success adversarial paths across LLMs. Our experimental results on several closed-source and open-source LLMs show that NEXUS can achieve a higher attack success rate, between 2.1% and 19.4%, compared to state-of-the-art approaches. Our source code is available at https://github.com/inspire-lab/NEXUS.- Anthology ID:
- 2025.emnlp-main.1235
- Volume:
- Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 24278–24306
- Language:
- URL:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1235/
- DOI:
- Cite (ACL):
- Javad Rafiei Asl, Sidhant Narula, Mohammad Ghasemigol, Eduardo Blanco, and Daniel Takabi. 2025. NEXUS: Network Exploration for eXploiting Unsafe Sequences in Multi-Turn LLM Jailbreaks. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 24278–24306, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- NEXUS: Network Exploration for eXploiting Unsafe Sequences in Multi-Turn LLM Jailbreaks (Rafiei Asl et al., EMNLP 2025)
- PDF:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1235.pdf