Xiaoyan Sun
Other people with similar names: Xiaoyan Sun
2026
PRA-RAG: Provably Robust Aggregation in Retrieval-Augmented Generation against Retrieval Corruption
Xue Tan | Yi Zheng | Chang Huo | Yunruo Zhang | Yu Liu | Hao Luan | Zhuyang Yu | Jun Dai | Xiaoyan Sun | Ping Chen
Findings of the Association for Computational Linguistics: ACL 2026
Xue Tan | Yi Zheng | Chang Huo | Yunruo Zhang | Yu Liu | Hao Luan | Zhuyang Yu | Jun Dai | Xiaoyan Sun | Ping Chen
Findings of the Association for Computational Linguistics: ACL 2026
Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by incorporating external knowledge, effectively mitigating their inherent knowledge limitations. However, RAG remains vulnerable to poisoning attacks that manipulate retrieved texts to mislead model outputs. Existing defense mechanisms often lack theoretical robustness guarantees and perform unreliably when the LLM has limited knowledge of the retrieved content. In this work, we propose PRA-RAG, a provably robust retrieval aggregation algorithm designed to defend against poisoning attacks on retrieved texts. PRA-RAG samples multiple combinations of retrieved texts and utilizes geometric structures in the embedding space to identify a robust subset, from which a stable aggregated representation is derived. We provide theoretical bounds on the maximum impact of poisoned retrieved content and establish a quantitative measure of RAG’s robustness. Experiments across multiple benchmarks and RAG architectures demonstrate that PRA-RAG reduces the attack success rate to as low as 1% while maintaining an accuracy of 71%, significantly outperforming representative state-of-the-art (SOTA) methods.
2025
RevPRAG: Revealing Poisoning Attacks in Retrieval-Augmented Generation through LLM Activation Analysis
Xue Tan | Hao Luan | Mingyu Luo | Xiaoyan Sun | Ping Chen | Jun Dai
Findings of the Association for Computational Linguistics: EMNLP 2025
Xue Tan | Hao Luan | Mingyu Luo | Xiaoyan Sun | Ping Chen | Jun Dai
Findings of the Association for Computational Linguistics: EMNLP 2025
Retrieval-Augmented Generation (RAG) enriches the input to LLMs by retrieving information from the relevant knowledge database, enabling them to produce responses that are more accurate and contextually appropriate. It is worth noting that the knowledge database, being sourced from publicly available channels such as Wikipedia, inevitably introduces a new attack surface. RAG poisoning attack involves injecting malicious texts into the knowledge database, ultimately leading to the generation of the attacker’s target response (also called poisoned response). However, there are currently limited methods available for detecting such poisoning attacks. We aim to bridge the gap in this work by introducing RevPRAG, a flexible and automated detection pipeline that leverages the activations of LLMs for poisoned response detection. Our investigation uncovers distinct patterns in LLMs’ activations when generating poisoned responses versus correct responses. Our results on multiple benchmarks and RAG architectures show our approach can achieve a 98% true positive rate, while maintaining a false positive rate close to 1%.