SubmissionNumber#=%=#5 FinalPaperTitle#=%=#RedHit: Adaptive Red-Teaming of Large Language Models via Search, Reasoning, and Preference Optimization ShortPaperTitle#=%=# NumberOfPages#=%=#10 CopyrightSigned#=%=#ALI DEHGHANATANHA JobTitle#==# Organization#==#University of Guelph, ON, Canada Abstract#==#Red-teaming has become a critical component of Large Language Models (LLMs) security amid increasingly sophisticated adversarial techniques. However, existing methods often depend on hard-coded strategies that quickly become obsolete against novel attack patterns, requiring constant updates.Moreover, current automated red-teaming approaches typically lack effective reasoning ca- pabilities, leading to lower attack success rates and longer training times. In this paper, we propose RedHit, a multi-round, automated, and adaptive red-teaming framework that in- tegrates Monte Carlo Tree Search (MCTS), Chain-of-Thought (CoT) reasoning, and Direct Preference Optimization (DPO) to enhance the adversarial capabilities of an Adversarial LLM (ALLM). RedHit formulates prompt injection as a tree search problem, where the goal is to discover adversarial prompts capable of bypassing target model defenses. Each search step is guided by an Evaluator module that dynamically scores model responses using multi-detector feedback, yielding fine-grained reward signals. MCTS is employed to explore the space of adversarial prompts, incrementally constructing a Prompt Search Tree (PST) in which each node stores an adversarial prompt, its response, a reward, and other control properties. Prompts are generated via a locally hosted IndirectPromptGenerator module, which uses CoT-enabled prompt transformation to create multi-perspective, seman- tically equivalent variants for deeper tree exploration. CoT reasoning improves MCTS exploration by injecting strategic insights derived from past interactions, enabling RedHit to adapt dynamically to the target LLM's defenses. Furthermore, DPO fine-tunes ALLM using preference data collected from previous attack rounds, progressively enhancing its abil- ity to generate more effective prompts. Red-Hit leverages the Garak framework to evaluate each adversarial prompt and compute rewards,demonstrating robust and adaptive adversarial behavior across multiple attack rounds. Author{1}{Firstname}#=%=#Mohsen Author{1}{Lastname}#=%=#Sorkhpour Author{1}{Email}#=%=#msorkhpo@uoguelph.ca Author{1}{Affiliation}#=%=#Cyber Science Lab, University of Guelph Author{2}{Firstname}#=%=#Abbas Author{2}{Lastname}#=%=#Yazdinejad Author{2}{Email}#=%=#ayazdine@uoguelph.ca Author{2}{Affiliation}#=%=#Cyber Science Lab, University of Guelph Author{3}{Firstname}#=%=#Ali Author{3}{Lastname}#=%=#Dehghantanha Author{3}{Username}#=%=#alidehghantanha Author{3}{Email}#=%=#adehghan@uoguelph.ca Author{3}{Affiliation}#=%=#University of Guelph ========== èéáğö