2025
pdf
bib
abs
Stepwise Reasoning Disruption Attack of LLMs
Jingyu Peng
|
Maolin Wang
|
Xiangyu Zhao
|
Kai Zhang
|
Wanyu Wang
|
Pengyue Jia
|
Qidong Liu
|
Ruocheng Guo
|
Qi Liu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language models (LLMs) have made remarkable strides in complex reasoning tasks, but their safety and robustness in reasoning processes remain unexplored, particularly in third-party platforms that facilitate user interactions via APIs. Existing attacks on LLM reasoning are constrained by specific settings or lack of imperceptibility, limiting their feasibility and generalizability. To address these challenges, we propose the Stepwise rEasoning Error Disruption (SEED) attack, which subtly injects errors into prior reasoning steps to mislead the model into producing incorrect subsequent reasoning and final answers. Unlike previous methods, SEED is compatible with zero-shot and few-shot settings, maintains the natural reasoning flow, and ensures covert execution without modifying the instruction. Extensive experiments on four datasets across four different models demonstrate SEED’s effectiveness, revealing the vulnerabilities of LLMs to disruptions in reasoning processes. These findings underscore the need for greater attention to the robustness of LLM reasoning to ensure safety in practical applications. Our code is available at: https://github.com/Applied-Machine-Learning-Lab/SEED-Attack
pdf
bib
abs
Training-free LLM Merging for Multi-task Learning
Zichuan Fu
|
Xian Wu
|
Yejing Wang
|
Wanyu Wang
|
Shanshan Ye
|
Hongzhi Yin
|
Yi Chang
|
Yefeng Zheng
|
Xiangyu Zhao
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Language Models (LLMs) have demonstrated exceptional capabilities across diverse natural language processing (NLP) tasks. The release of open-source LLMs like LLaMA and Qwen has triggered the development of numerous fine-tuned models tailored for various tasks and languages. In this paper, we explore an important question: is it possible to combine these specialized models to create a unified model with multi-task capabilities. We introduces **H**ierarchical **I**terative **Merging** (Hi-Merging), a training-free method for unifying different specialized LLMs into a single model. Specifically, Hi-Merging employs model-wise and layer-wise pruning and scaling, guided by contribution analysis, to mitigate parameter conflicts. Extensive experiments on multiple-choice and question-answering tasks in both Chinese and English validate Hi-Merging’s ability for multi-task learning. The results demonstrate that Hi-Merging consistently outperforms existing merging techniques and surpasses the performance of models fine-tuned on combined datasets in most scenarios. Code is available at [Applied-Machine-Learning-Lab/Hi-Merging](https://github.com/Applied-Machine-Learning-Lab/Hi-Merging).
pdf
bib
abs
Model Merging for Knowledge Editing
Zichuan Fu
|
Xian Wu
|
Guojing Li
|
Yingying Zhang
|
Yefeng Zheng
|
Tianshi Ming
|
Yejing Wang
|
Wanyu Wang
|
Xiangyu Zhao
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)
Large Language Models (LLMs) require continuous updates to maintain accurate and current knowledge as the world evolves. While existing knowledge editing approaches offer various solutions for knowledge updating, they often struggle with sequential editing scenarios and harm the general capabilities of the model, thereby significantly hampering their practical applicability.This paper proposes a two-stage framework combining robust supervised fine-tuning (R-SFT) with model merging for knowledge editing. Our method first fine-tunes the LLM to internalize new knowledge fully, then merges the fine-tuned model with the original foundation model to preserve newly acquired knowledge and general capabilities. Experimental results demonstrate that our approach significantly outperforms existing methods in sequential editing while better preserving the original performance of the model, all without requiring any architectural changes. Code is available at [Applied-Machine-Learning-Lab/MM4KE](https://github.com/Applied-Machine-Learning-Lab/MM4KE).