Kihyun Kim
2026
STAR-Teaming: A Strategy-Response Multiplex Network Approach to Automated LLM Red Teaming
Min Jae Jung | YongTaek Lim | Chaeyun Kim | Junghwan Kim | Kihyun Kim | Minwoo Kim
Findings of the Association for Computational Linguistics: ACL 2026
Min Jae Jung | YongTaek Lim | Chaeyun Kim | Junghwan Kim | Kihyun Kim | Minwoo Kim
Findings of the Association for Computational Linguistics: ACL 2026
While Large Language Models (LLMs) are widely used, they remain susceptible to jailbreak prompts that can elicit harmful or inappropriate responses. This paper introduces STAR-Teaming, a novel black-box framework for automated red teaming that effectively generates such prompts. STAR-Teaming integrates a Multi-Agent System (MAS) with a Strategy-Response Multiplex Network and employs network-driven optimization to sample effective attack strategies. This network-based approach recasts the intractable high-dimensional embedding space into a tractable structure, yielding two key advantages: it enhances the interpretability of the LLM’s strategic vulnerabilities, and it streamlines the search for effective strategies by organizing the search space into semantic communities, thereby preventing redundant exploration. Empirical results demonstrate that STAR-Teaming significantly surpasses existing methods, achieving a higher attack success rate (ASR) at a lower computational cost. Extensive experiments validate the effectiveness and explainability of the Multiplex Network. The code is available at https://github.com/selectstar-ai/STAR-Teaming-paper.
2025
DiaTool-DPO: Multi-Turn Direct Preference Optimization for Tool-Augmented Large Language Models
Sunghee Jung | Donghun Lee | Shinbok Lee | Gaeun Seo | Daniel Lee | Byeongil Ko | Junrae Cho | Kihyun Kim | EungGyun Kim | Myeongcheol Shin
Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Sunghee Jung | Donghun Lee | Shinbok Lee | Gaeun Seo | Daniel Lee | Byeongil Ko | Junrae Cho | Kihyun Kim | EungGyun Kim | Myeongcheol Shin
Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Tool-Augmented Larage Language Models (TA-LLMs) have shown promise in real-world applications, but face challenges in handling incomplete queries and out-of-scope requests. While existing approaches rely mainly on Supervised Fine-Tuning with expert trajectories, we propose DiaTool-DPO, a novel method that enhances TA-LLM’s dialogue capabilities through Direct Preference Optimization. We model TA-LLM interactions as a Markov Decision Process with 5 distinct dialogue states and categorize user queries into 3 types based on their state transition trajectories. We automatically construct paired trajectory datasets of correct and incorrect dialogue flows and introduce a specialized objective loss for dialogue control. Our comprehensive evaluation demonstrates that DiaTool-DPO approaches GPT-4o’s performance (94.8% in information gathering, 91% in tool call rejection) with substantial improvements over baseline (44% and 9.6% respectively) while maintaining core functionality. Our approach opens new possibilities for developing TA-LLMs that can handle diverse real-world scenarios without requiring additional expert demonstrations or human labeling.