Xiyuan Yang


2025

pdf bib
Defending against Indirect Prompt Injection by Instruction Detection
Tongyu Wen | Chenglong Wang | Xiyuan Yang | Haoyu Tang | Yueqi Xie | Lingjuan Lyu | Zhicheng Dou | Fangzhao Wu
Findings of the Association for Computational Linguistics: EMNLP 2025

The integration of Large Language Models (LLMs) with external sources is becoming increasingly common, with Retrieval-Augmented Generation (RAG) being a prominent example. However, this integration introduces vulnerabilities of Indirect Prompt Injection (IPI) attacks, where hidden instructions embedded in external data can manipulate LLMs into executing unintended or harmful actions. We recognize that IPI attacks fundamentally rely on the presence of instructions embedded within external content, which can alter the behavioral states of LLMs. Can the effective detection of such state changes help us defend against IPI attacks? In this paper, we propose InstructDetector, a novel detection-based approach that leverages the behavioral states of LLMs to identify potential IPI attacks. Specifically, we demonstrate the hidden states and gradients from intermediate layers provide highly discriminative features for instruction detection. By effectively combining these features, InstructDetector achieves a detection accuracy of 99.60% in the in-domain setting and 96.90% in the out-of-domain setting, and reduces the attack success rate to just 0.03% on the BIPIA benchmark. The code is publicly available at https://github.com/MYVAE/Instruction-detection.

2019

pdf bib
Learning Dynamic Context Augmentation for Global Entity Linking
Xiyuan Yang | Xiaotao Gu | Sheng Lin | Siliang Tang | Yueting Zhuang | Fei Wu | Zhigang Chen | Guoping Hu | Xiang Ren
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Despite of the recent success of collective entity linking (EL) methods, these “global” inference methods may yield sub-optimal results when the “all-mention coherence” assumption breaks, and often suffer from high computational cost at the inference stage, due to the complex search space. In this paper, we propose a simple yet effective solution, called Dynamic Context Augmentation (DCA), for collective EL, which requires only one pass through the mentions in a document. DCA sequentially accumulates context information to make efficient, collective inference, and can cope with different local EL models as a plug-and-enhance module. We explore both supervised and reinforcement learning strategies for learning the DCA model. Extensive experiments show the effectiveness of our model with different learning settings, base models, decision orders and attention mechanisms.