M2S: Multi-turn to Single-turn jailbreak in Red Teaming for LLMs

Junwoo Ha, Hyunjun Kim, Sangyoon Yu, Haon Park, Ashkan Yousefpour, Yuna Park, Suhyun Kim


Abstract
We introduce a novel framework for consolidating multi-turn adversarial “jailbreak” prompts into single-turn queries, significantly reducing the manual overhead required for adversarial testing of large language models (LLMs). While multi-turn human jailbreaks have been shown to yield high attack success rates (ASRs), they demand considerable human effort and time. Our proposed Multi-turn-to-Single-turn (M2S) methods—Hyphenize, Numberize, and Pythonize—systematically reformat multi-turn dialogues into structured single-turn prompts. Despite eliminating iterative back-and-forth interactions, these reformatted prompts preserve and often enhance adversarial potency: in extensive evaluations on the Multi-turn Human Jailbreak (MHJ) dataset, M2S methods yield ASRs ranging from 70.6 % to 95.9 % across various state-of-the-art LLMs. Remarkably, our single-turn prompts outperform the original multi-turn attacks by up to 17.5 % in absolute ASR, while reducing token usage by more than half on average. Further analyses reveal that embedding malicious requests in enumerated or code-like structures exploits “contextual blindness,” undermining both native guardrails and external input-output safeguards. By consolidating multi-turn conversations into efficient single-turn prompts, our M2S framework provides a powerful tool for large-scale red-teaming and exposes critical vulnerabilities in contemporary LLM defenses. All code, data, and conversion prompts are available for reproducibility and further investigations: https://github.com/Junuha/M2S_DATA
Anthology ID:
2025.acl-long.805
Original:
2025.acl-long.805v1
Version 2:
2025.acl-long.805v2
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
16489–16507
Language:
URL:
https://preview.aclanthology.org/add-orcids-2024-eacl/2025.acl-long.805/
DOI:
10.18653/v1/2025.acl-long.805
Bibkey:
Cite (ACL):
Junwoo Ha, Hyunjun Kim, Sangyoon Yu, Haon Park, Ashkan Yousefpour, Yuna Park, and Suhyun Kim. 2025. M2S: Multi-turn to Single-turn jailbreak in Red Teaming for LLMs. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 16489–16507, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
M2S: Multi-turn to Single-turn jailbreak in Red Teaming for LLMs (Ha et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/add-orcids-2024-eacl/2025.acl-long.805.pdf