Qian Xiong
2026
Know Thy Enemy: Securing LLMs Against Prompt Injection via Diverse Data Synthesis and Instruction-Level Chain-of-Thought Learning
Zhiyuan Chang | Mingyang Li | Yuekai Huang | Ziyou Jiang | Xiaojun Jia | Qian Xiong | Junjie Wang | Zhaoyang Li | Qing Wang
Findings of the Association for Computational Linguistics: ACL 2026
Zhiyuan Chang | Mingyang Li | Yuekai Huang | Ziyou Jiang | Xiaojun Jia | Qian Xiong | Junjie Wang | Zhaoyang Li | Qing Wang
Findings of the Association for Computational Linguistics: ACL 2026
Large language model (LLM)-integrated applications have become increasingly prevalent, yet face critical security vulnerabilities from prompt injection (PI) attacks. Defending against PI attacks faces two major issues: malicious instructions can be injected through diverse vectors, and injected instructions often lack clear semantic boundaries from the surrounding context, making them difficult to identify. To address these issues, we propose InstruCoT, a model enhancement method for PI defense that synthesizes diverse training data and employs instruction-level chain-of-thought fine-tuning, enabling LLMs to effectively identify and reject malicious instructions regardless of their source or position in the context. We evaluate InstruCoT across three critical dimensions: Behavior Deviation, Privacy Leakage, and Harmful Output. Experimental results across four LLMs demonstrate that InstruCoT significantly outperforms baselines in all dimensions while maintaining utility performance without degradation.
Are All Prompt Components Value-Neutral? Understanding the Heterogeneous Adversarial Robustness of Dissected Prompt in LLMs
Yujia Zheng | Tianhao Li | Haotian Huang | Tianyu Zeng | Jingyu Lu | Chuangxin Chu | Yuekai Huang | Ziyou Jiang | Qian Xiong | Yuyao Ge | Mingyang Li
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Yujia Zheng | Tianhao Li | Haotian Huang | Tianyu Zeng | Jingyu Lu | Chuangxin Chu | Yuekai Huang | Ziyou Jiang | Qian Xiong | Yuyao Ge | Mingyang Li
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Prompt-based adversarial attacks are a key tool for assessing the robustness of large language models (LLMs). Yet, existing studies typically treat prompts as flat text, overlooking their internal structure, different components within a prompt contribute unequally to robustness. This work introduces PromptAnatomy, a framework that decomposes prompts into functional components, and ComPerturb, a controlled perturbation method that selectively modifies these components to expose component-wise vulnerabilities while ensuring linguistic plausibility via perplexity-based filtering. Using this framework, four instruction-tuning datasets are structurally annotated and validated by human reviewers. Experiments across five advanced LLMs show that ComPerturb achieves state-of-the-art attack success rates, while ablation analyses confirm the complementary effects of prompt dissection and perplexity filtering. These results highlight the importance of structural awareness in evaluating and improving the adversarial robustness of LLMs.
2025
Butterfly Effects in Toolchains: A Comprehensive Analysis of Failed Parameter Filling in LLM Tool-Agent Systems
Qian Xiong | Yuekai Huang | Ziyou Jiang | Zhiyuan Chang | Yujia Zheng | Tianhao Li | Mingyang Li
Findings of the Association for Computational Linguistics: EMNLP 2025
Qian Xiong | Yuekai Huang | Ziyou Jiang | Zhiyuan Chang | Yujia Zheng | Tianhao Li | Mingyang Li
Findings of the Association for Computational Linguistics: EMNLP 2025
The emergence of the tool agent paradigm has broadened the capability boundaries of the Large Language Model (LLM), enabling it to complete more complex tasks. However, the effectiveness of this paradigm is limited due to the issue of parameter failure during its execution. To explore this phenomenon and propose corresponding suggestions, we first construct a parameter failure taxonomy in this paper. We derive five failure categories from the invocation chain of a mainstream tool agent. Then, we explore the correlation between three different input sources and failure categories by applying 15 input perturbation methods to the input. Experimental results show that parameter name hallucination failure primarily stems from inherent LLM limitations, while issues with input sources mainly cause other failure patterns. To improve the reliability and effectiveness of tool-agent interactions, we propose corresponding improvement suggestions, including standardizing tool return formats, improving error feedback mechanisms, and ensuring parameter consistency.