STAR-Teaming: A Strategy-Response Multiplex Network Approach to Automated LLM Red Teaming

Min Jae Jung; YongTaek Lim; Chaeyun Kim; Junghwan Kim; Kihyun Kim; Minwoo Kim

STAR-Teaming: A Strategy-Response Multiplex Network Approach to Automated LLM Red Teaming

Min Jae Jung, YongTaek Lim, Chaeyun Kim, Junghwan Kim, Kihyun Kim, Minwoo Kim

Abstract

While Large Language Models (LLMs) are widely used, they remain susceptible to jailbreak prompts that can elicit harmful or inappropriate responses. This paper introduces STAR-Teaming, a novel black-box framework for automated red teaming that effectively generates such prompts. STAR-Teaming integrates a Multi-Agent System (MAS) with a Strategy-Response Multiplex Network and employs network-driven optimization to sample effective attack strategies. This network-based approach recasts the intractable high-dimensional embedding space into a tractable structure, yielding two key advantages: it enhances the interpretability of the LLM’s strategic vulnerabilities, and it streamlines the search for effective strategies by organizing the search space into semantic communities, thereby preventing redundant exploration. Empirical results demonstrate that STAR-Teaming significantly surpasses existing methods, achieving a higher attack success rate (ASR) at a lower computational cost. Extensive experiments validate the effectiveness and explainability of the Multiplex Network. The code is available at https://github.com/selectstar-ai/STAR-Teaming-paper.

Anthology ID:: 2026.findings-acl.1470
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 29406–29435
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1470/
DOI:
Bibkey:
Cite (ACL):: Min Jae Jung, YongTaek Lim, Chaeyun Kim, Junghwan Kim, Kihyun Kim, and Minwoo Kim. 2026. STAR-Teaming: A Strategy-Response Multiplex Network Approach to Automated LLM Red Teaming. In Findings of the Association for Computational Linguistics: ACL 2026, pages 29406–29435, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: STAR-Teaming: A Strategy-Response Multiplex Network Approach to Automated LLM Red Teaming (Jung et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1470.pdf
Checklist:: 2026.findings-acl.1470.checklist.pdf

PDF Cite Search Checklist Fix data