Sen Su

2025

pdf bib abs
Upcycling Instruction Tuning from Dense to Mixture-of-Experts via Parameter Merging
Tingfeng Hui | Zhenyu Zhang | Shuohuan Wang | Yu Sun | Hua Wu | Sen Su
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Mixture-of-Experts (MoE) shines brightly in large language models (LLMs) and demonstrates outstanding performance in plentiful natural language processing tasks. However, existing methods transforming LLMs from dense to MoE face significant data requirements and typically rely on large-scale post-training. In this paper, we propose Upcycling Instruction Tuning (UpIT), a data-efficient approach for tuning a dense pre-trained model into a MoE instruction model. Specifically, we first point out that intermediate checkpoints during instruction tuning of the dense model are naturally suitable for specialized experts, and then propose an expert expansion stage to flexibly achieve models with flexible numbers of experts, where genetic algorithm and parameter merging are introduced to ensure sufficient diversity of new extended experts. To ensure that each specialized expert in the MoE model works as expected, we select a small amount of seed data that each expert excels to pre-optimize the router. Extensive experiments with various data scales and upcycling settings demonstrate the outstanding performance and data efficiency of UpIT, as well as stable improvement in expert or data scaling. Further analysis reveals the importance of ensuring expert diversity in upcycling.

pdf bib abs
DSG-MCTS: A Dynamic Strategy-Guided Monte Carlo Tree Search for Diversified Reasoning in Large Language Models
Rui Ha | Chaozhuo Li | Rui Pu | Litian Zhang | Xi Zhang | Sen Su
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Large language models (LLMs) have shown strong potential in complex reasoning tasks. However, as task complexity increases, their performance often degrades, resulting in hallucinations, errors, and logical inconsistencies. To enhance reasoning capabilities, Monte Carlo Tree Search (MCTS) has been introduced to guide the exploration of reasoning paths in a structured manner. Despite its advantages, traditional MCTS relies on fixed reasoning strategies, limiting the diversity of reasoning paths and the coverage of the solution space. To address these limitations, we propose Dynamic Strategy-Guided MCTS (DSG-MCTS), a novel framework that dynamically integrates multiple reasoning strategies, such as abductive and analogical reasoning, to expand the reasoning space. At the same time, DSG-MCTS enhances reasoning efficiency through a dynamic strategy selection mechanism that adapts to the task context. Experimental results on challenging reasoning benchmarks demonstrate that DSG-MCTS achieves improved accuracy and efficiency, outperforming existing state-of-the-art methods.

Large Language Models (LLMs) have demonstrated remarkable performance across diverse tasks yet still are vulnerable to external threats, particularly LLM Denial-of-Service (LLM-DoS) attacks. Specifically, LLM-DoS attacks aim to exhaust computational resources and block services. However, existing studies predominantly focus on white-box attacks, leaving black-box scenarios underexplored. In this paper, we introduce Auto-Generation for LLM-DoS (AutoDoS) attack, an automated algorithm designed for black-box LLMs. AutoDoS constructs the DoS Attack Tree and expands the node coverage to achieve effectiveness under black-box conditions. By transferability-driven iterative optimization, AutoDoS could work across different models in one prompt.Furthermore, we reveal that embedding the Length Trojan allows AutoDoS to bypass existing defenses more effectively.Experimental results show that AutoDoS significantly amplifies service response latency by over 250×↑, leading to severe resource consumption in terms of GPU utilization and memory usage. Our work provides a new perspective on LLM-DoS attacks and security defenses.

As LLM-based agents become increasingly prevalent, triggers implanted in user queries or environment feedback can activate hidden backdoors, raising critical concerns about safety vulnerabilities in agents.However, traditional backdoor attacks are often detectable by safety audits that analyze the reasoning process of agents, hindering further progress in agent safety research.To this end, we propose a novel backdoor implantation strategy called Dynamically Encrypted Multi-Backdoor Implantation Attack. Specifically, we introduce dynamic encryption, which maps the backdoor into benign content, effectively circumventing safety audits.To enhance stealthiness, we further decompose the backdoor into multiple sub-backdoor fragments. Based on these advancements, backdoors are allowed to bypass safety audits significantly.Additionally, we present AgentBackdoorEval, a dataset designed for the comprehensive evaluation of agent backdoor attacks.Experimental results across multiple datasets demonstrate that our method achieves an attack success rate approaching 100% while maintaining a detection rate of 0%, illustrating its effectiveness in evading safety audits.Our findings highlight the limitations of existing safety mechanisms in detecting advanced attacks, underscoring the urgent need for more robust defenses against backdoor threats.Code and data are available at https://github.com/whfeLingYu/DemonAgent.

Large Language Models (LLMs), due to substantial computational requirements, are vulnerable to resource consumption attacks, which can severely degrade server performance or even cause crashes, as demonstrated by denial-of-service (DoS) attacks designed for LLMs. However, existing works lack mitigation strategies against such threats, resulting in unresolved security risks for real-world LLM deployments. To this end, we propose the Pluggable and Dynamic DoS-Defense Framework (PD³F), which employs a two-stage approach to defend against resource consumption attacks from both the input and output sides. On the input side, we propose the Resource Index to guide Dynamic Request Polling Scheduling, thereby reducing computing resource usage induced by malicious prompts under high-concurrency scenarios. On the output side, we introduce the Adaptive End-Based Suppression mechanism, which reduces excessive malicious generation. Experiments across six models demonstrate that PD³F significantly mitigates resource consumption attacks, improving users’ access capacity by up to 500% during adversarial load. PD³F represents a step toward the resilient and resource-aware deployment of LLMs against resource consumption attacks.

Large Language Models (LLMs) have revolutionized language processing and understanding, yet their performance is hampered by inaccuracies and outdated information. Model editing techniques offer a solution but face two key challenges: **(I)** Most methods inject knowledge by constructing rigid loss, which leads to poor compatibility when dealing with higher-order multi-hop problems. **(II)** Locate-then-edit vein, by altering pre-trained parameters, inevitably affect normal knowledge and even face the catastrophic forgetting. In this paper, we introduce **KGMET**, a framework that constructs knowledge graphs using available information to guide the direction of knowledge editing, enabling **consistent**, **aligned**, and **stable** information during **large-scale** editing scenario. Furthermore, *KGMET* goes beyond this by employing orthogonal constraints to block the interference of irrelevant information, ensuring the updates are both controllable and generalizable. Experiments on Multi-Conterfact, ZsRE, and MQuAKE datasets using *Llama-3-8B*, *GPT-J-6B*, and *GPT-2-XL* models showcase improvements over state-of-the-art methods, with ↑ 5%-17% in multi-hop tasks while remaining generalizable (at least ↑ 20% in fluency). Our code is available on Github.

2024

pdf bib abs
Alignment-Enhanced Decoding: Defending Jailbreaks via Token-Level Adaptive Refining of Probability Distributions
Quan Liu | Zhenhong Zhou | Longzhu He | Yi Liu | Wei Zhang | Sen Su
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Large language models are susceptible to jailbreak attacks, which can result in the generation of harmful content. While prior defenses mitigate these risks by perturbing or inspecting inputs, they ignore competing objectives, the underlying cause of alignment failures. In this paper, we propose Alignment-Enhanced Decoding (AED), a novel defense that employs adaptive decoding to address the root causes of jailbreak issues. We first define the Competitive Index to quantify alignment failures and utilize feedback from self-evaluation to compute post-alignment logits. Then, AED adaptively combines Competitive Index and post-alignment logits with the original logits to obtain harmless and helpful distributions. Consequently, our method enhances safety alignment while maintaining helpfulness. We conduct experiments across five models and four common jailbreaks, with the results validating the effectiveness of our approach.

2021

pdf bib abs
Treasures Outside Contexts: Improving Event Detection via Global Statistics
Rui Li | Wenlin Zhao | Cheng Yang | Sen Su
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Event detection (ED) aims at identifying event instances of specified types in given texts, which has been formalized as a sequence labeling task. As far as we know, existing neural-based ED models make decisions relying entirely on the contextual semantic features of each word in the inputted text, which we find is easy to be confused by the varied contexts in the test stage. To this end, we come up with the idea of introducing a set of statistical features from word-event co-occurrence frequencies in the entire training set to cooperate with contextual features. Specifically, we propose a Semantic and Statistic-Joint Discriminative Network (SS-JDN) consisting of a semantic feature extractor, a statistical feature extractor, and a joint event discriminator. In experiments, SS-JDN effectively exceeds ten recent strong baselines on ACE2005 and KBP2015 datasets. Further, we perform extensive experiments to comprehensively probe SS-JDN.