Jiawei Yang

2026

Treating random masking as a performance plug-in for large language models (LLMs) offers three advantages: low coupling to the task, the model, and training resources. However, the critical drawback is that its gains are highly stochastic. Motivated by this, we propose play-it-by-ear masking performance plug-in (PibE-MPP), which enables LLMs to adaptively select masking target combinations for each task, retaining these advantages and mitigating the drawback. Specifically, we pose two core questions—what are the masking targets and what is the masking strategy under 7 constraints obtained from these advantages and a drawback. For the first question, we select all attention heads in the last layer as masking targets by constructing a first-order Markov process with alternating hidden state and information fusion. The feasibility of this target is validated by random masking experiments. For the second question, we first construct a small yet interpretable candidate set by proposing a three-axis mapping and a mean-based criterion for fusion features of masking targets. We then propose an axis-variance minimization to select a compact masking-target combination, reducing sensitivity to outlier targets. Experiments on 6 LLMs (Qwen and LLaMA) and 24 datasets demonstrate PibE-MPP’s effectiveness and generality, gain stability, and domain performance, and verify the necessity of its final module, providing empirical evidence of its transferability across tasks and models. The code is available at https://github.com/wtctcop/PibE-MPP.

pdf bib abs

Role-playing agents(RPAs) are widely used to steer large language models(LLMs) toward role-consistent behavior, yet existing benchmarks mainly evaluate surface-level fidelity and offer limited insight into decision making under role–alignment value conflicts. To address this gap, we introduce RoleCDE, the first benchmark designed to evaluate RPAs under structured conflicts between role-specific values and alignment-oriented constraints. RoleCDE formulates role-aware decision making as cognitive dilemma scenarios, jointly evaluating role–scenario grounding, value conflict resolution, and decision tendencies. The benchmark is constructed at scale, covering approximately 8k diverse role profiles and scenarios and nearly 240k dilemma instances across three difficulty levels and eight role categories. Evaluation of several mainstream LLMs reveals a "Role Value Decoupling" phenomenon, where agents systematically default to alignment- and morality-consistent decisions rather than role-specific values when the two conflict, even under explicit role conditioning. This behavior is largely invariant to dilemma difficulty but varies substantially across role categories. We further show that RoleCDE-based fine-tuning effectively mitigates this decoupling by improving value trade-off reasoning, while preserving general role-playing fidelity and general reasoning performance. Code is available at: https://github.com/rabbitrose/RoleCDE.

2025

pdf bib abs

The indexing-retrieval-generation paradigm of retrieval-augmented generation (RAG) has been highly successful in solving knowledge-intensive tasks by integrating external knowledge into large language models (LLMs). However, the incorporation of external and unverified knowledge increases the vulnerability of LLMs because attackers can perform attack tasks by manipulating knowledge. In this paper, we introduce a benchmark named SafeRAG designed to evaluate the RAG security. First, we classify attack tasks into silver noise, inter-context conflict, soft ad, and white Denial-of-Service. Next, we construct RAG security evaluation dataset (i.e., SafeRAG dataset) primarily manually for each task. We then utilize the SafeRAG dataset to simulate various attack scenarios that RAG may encounter. Experiments conducted on 14 representative RAG components demonstrate that RAG exhibits significant vulnerability to all attack tasks and even the most apparent attack task can easily bypass existing retrievers, filters, or advanced LLMs, resulting in the degradation of RAG service quality. Code is available at: https://github.com/IAAR-Shanghai/SafeRAG.

pdf bib abs

Existing pretraining data mixing methods for large language models (LLMs) typically follow a domain-wise methodology, a top-down process that first determines domain weights and then performs uniform data sampling across each domain. However, these approaches neglect significant inter-domain overlaps and commonalities, failing to control the global diversity of the constructed training dataset. Further, uniform sampling within domains ignores fine-grained sample-specific features, potentially leading to suboptimal data distribution. To address these shortcomings, we propose a novel sample-wise data mixture approach based on a bottom-up paradigm. This method performs global cross-domain sampling by systematically evaluating the quality and diversity of each sample, thereby dynamically determining the optimal domain distribution. Comprehensive experiments across multiple downstream tasks and perplexity assessments demonstrate that SampleMix surpasses existing domain-based methods. Meanwhile, SampleMix requires 1.4x to 2.1x fewer training steps to achieve the baselines’ performance, highlighting the substantial potential of SampleMix to optimize pre-training data.