Jiawei Yang
2025
SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language Model
Xun Liang
|
Simin Niu
|
Zhiyu Li
|
Sensen Zhang
|
Hanyu Wang
|
Feiyu Xiong
|
Zhaoxin Fan
|
Bo Tang
|
Jihao Zhao
|
Jiawei Yang
|
Shichao Song
|
Mengwei Wang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The indexing-retrieval-generation paradigm of retrieval-augmented generation (RAG) has been highly successful in solving knowledge-intensive tasks by integrating external knowledge into large language models (LLMs). However, the incorporation of external and unverified knowledge increases the vulnerability of LLMs because attackers can perform attack tasks by manipulating knowledge. In this paper, we introduce a benchmark named SafeRAG designed to evaluate the RAG security. First, we classify attack tasks into silver noise, inter-context conflict, soft ad, and white Denial-of-Service. Next, we construct RAG security evaluation dataset (i.e., SafeRAG dataset) primarily manually for each task. We then utilize the SafeRAG dataset to simulate various attack scenarios that RAG may encounter. Experiments conducted on 14 representative RAG components demonstrate that RAG exhibits significant vulnerability to all attack tasks and even the most apparent attack task can easily bypass existing retrievers, filters, or advanced LLMs, resulting in the degradation of RAG service quality. Code is available at: https://github.com/IAAR-Shanghai/SafeRAG.
SampleMix: A Sample-wise Pre-training Data Mixing Strategy by Coordinating Data Quality and Diversity
Xiangyu Xi
|
Deyang Kong
|
Jian Yang
|
Jiawei Yang
|
Zhengyu Chen
|
Wei Wang
|
Jingang Wang
|
Xunliang Cai
|
Shikun Zhang
|
Wei Ye
Findings of the Association for Computational Linguistics: EMNLP 2025
Existing pretraining data mixing methods for large language models (LLMs) typically follow a domain-wise methodology, a top-down process that first determines domain weights and then performs uniform data sampling across each domain. However, these approaches neglect significant inter-domain overlaps and commonalities, failing to control the global diversity of the constructed training dataset. Further, uniform sampling within domains ignores fine-grained sample-specific features, potentially leading to suboptimal data distribution. To address these shortcomings, we propose a novel sample-wise data mixture approach based on a bottom-up paradigm. This method performs global cross-domain sampling by systematically evaluating the quality and diversity of each sample, thereby dynamically determining the optimal domain distribution. Comprehensive experiments across multiple downstream tasks and perplexity assessments demonstrate that SampleMix surpasses existing domain-based methods. Meanwhile, SampleMix requires 1.4x to 2.1x fewer training steps to achieve the baselines’ performance, highlighting the substantial potential of SampleMix to optimize pre-training data.
Search
Fix author
Co-authors
- Xunliang Cai 1
- Zhengyu Chen 1
- Zhaoxin Fan 1
- Deyang Kong 1
- Zhiyu Li 1
- show all...