Xuetao Wei


2025

pdf bib
ImPart: Importance-Aware Delta-Sparsification for Improved Model Compression and Merging in LLMs
Yan Yang | Yixia Li | Hongru Wang | Xuetao Wei | James Jianqiao Yu | Yun Chen | Guanhua Chen
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

With the proliferation of task-specific large language models, delta compression has emerged as a method to mitigate the resource challenges of deploying numerous such models by effectively compressing the delta model parameters. Previous delta-sparsification methods either remove parameters randomly or truncate singular vectors directly after singular value decomposition (SVD). However, these methods either disregard parameter importance entirely or evaluate it with too coarse a granularity. In this work, we introduce ImPart, a novel importance-aware delta sparsification approach. Leveraging SVD, it dynamically adjusts sparsity ratios of different singular vectors based on their importance, effectively retaining crucial task-specific knowledge even at high sparsity ratios. Experiments show that ImPart achieves state-of-the-art delta sparsification performance, demonstrating higher compression ratio than baselines at the same performance level. When integrated with existing methods, ImPart sets a new state-of-the-art on delta quantization and model merging.

pdf bib
LLMs Trust Humans More, That’s a Problem! Unveiling and Mitigating the Authority Bias in Retrieval-Augmented Generation
Yuxuan Li | Xinwei Guo | Jiashi Gao | Guanhua Chen | Xiangyu Zhao | Jiaxin Zhang | Quanying Liu | Haiyan Wu | Xin Yao | Xuetao Wei
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Retrieval-Augmented Generation (RAG) has been proven to be an effective approach to address the hallucination problem in large language models (LLMs). In current RAG systems, LLMs typically need to synthesize knowledge provided by two main external sources (user prompts and an external database) to generate a final answer. When the knowledge provided by the user conflicts with that retrieved from the database, a critical question arises: Does the LLM favor one knowledge source over the other when generating the answer? In this paper, we are the first to unveil a new phenomenon, Authority Bias, where the LLMs tend to favor the knowledge provided by the user even when it deviates from the facts; this new phenomenon is rigorously evidenced via our novel and comprehensive characterization of Authority Bias in six widely used LLMs and across diverse task scenarios. We propose a novel dataset specifically designed for detecting Authority Bias, called the Authority Bias Detection Dataset (ABDD), and introduce new, detailed metrics to measure Authority Bias. To mitigate Authority bias, we finally propose the Conflict Detection Enhanced Query (CDEQ) framework. We identify the sentences and atomic information that generate conflicts, perform a credibility assessment on the conflicting paragraphs, and ultimately enhance the query to detect perturbed text, thereby reducing Authority bias. Comparative experiments with widely used mitigation methods demonstrate that CDEQ exhibits both effectiveness and advancement, significantly enhancing the robustness of RAG systems.

pdf bib
The Elephant in the Room: Exploring the Role of Neutral Words in Language Model Group-Agnostic Debiasing
Xinwei Guo | Jiashi Gao | Junlei Zhou | Jiaxin Zhang | Guanhua Chen | Xiangyu Zhao | Quanying Liu | Haiyan Wu | Xin Yao | Xuetao Wei
Findings of the Association for Computational Linguistics: ACL 2025

Large Language Models (LLMs) are increasingly integrated into our daily lives, raising significant ethical concerns, especially about perpetuating stereotypes.While group-specific debiasing methods have made progress, they often fail to address multiple biases simultaneously. In contrast, group-agnostic debiasing has the potential to mitigate a variety of biases at once, but remains underexplored.In this work, we investigate the role of neutral words—the group-agnostic component—in enhancing the group-agnostic debiasing process. We first reveal that neutral words are essential for preserving semantic modeling, and we propose 𝜖-DPCE, a method that incorporates a neutral word semantics-based loss function to effectively alleviate the deterioration of the Language Modeling Score (LMS) during the debiasing process. Furthermore, by introducing the SCM-Projection method, we demonstrate that SCM-based debiasing eliminates stereotypes by indirectly disrupting the association between attribute and neutral words in the Stereotype Content Model (SCM) space. Our experiments show that neutral words, which often embed multi-group stereotypical objects, play a key role in contributing to the group-agnostic nature of SCM-based debiasing.

pdf bib
SeqAR: Jailbreak LLMs with Sequential Auto-Generated Characters
Yan Yang | Zeguan Xiao | Xin Lu | Hongru Wang | Xuetao Wei | Hailiang Huang | Guanhua Chen | Yun Chen
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

The widespread applications of large language models (LLMs) have brought about concerns regarding their potential misuse. Although aligned with human preference data before release, LLMs remain vulnerable to various malicious attacks. In this paper, we adopt a red-teaming strategy to enhance LLM safety and introduce SeqAR, a simple yet effective framework to design jailbreak prompts automatically. The SeqAR framework generates and optimizes multiple jailbreak characters and then applies sequential jailbreak characters in a single query to bypass the guardrails of the target LLM. Different from previous work which relies on proprietary LLMs or seed jailbreak templates crafted by human expertise, SeqAR can generate and optimize the jailbreak prompt in a cold-start scenario using open-sourced LLMs without any seed jailbreak templates. Experimental results show that SeqAR achieves attack success rates of 88% and 60% in bypassing the safety alignment of GPT-3.5-1106 and GPT-4, respectively. Furthermore, we extensively evaluate the transferability of the generated templates across different LLMs and held-out malicious requests, while also exploring defense strategies against the jailbreak attack designed by SeqAR.