Zhe Ye


2025

pdf bib
AGENTVIGIL: Automatic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents
Zhun Wang | Vincent Siu | Zhe Ye | Tianneng Shi | Yuzhou Nie | Xuandong Zhao | Chenguang Wang | Wenbo Guo | Dawn Song
Findings of the Association for Computational Linguistics: EMNLP 2025

There emerges a critical security risk of LLM agents: indirect prompt injection, a sophisticated attack vector that compromises thecore of these agents, the LLM, by manipulating contextual information rather than direct user prompts. In this work, we propose a generic black-box optimization framework, AGENTVIGIL, designed to automatically discover and exploit indirect prompt injection vulnerabilities across diverse LLM agents. Our approach starts by constructing a high-quality initial seed corpus, then employs a seed selectionalgorithm based on Monte Carlo Tree Search (MCTS) to iteratively refine inputs, therebymaximizing the likelihood of uncovering agent weaknesses. We evaluate AGENTVIGIL on twopublic benchmarks, AgentDojo and VWA-adv, where it achieves 71% and 70% success rates against agents based on o3-mini and GPT-4o, respectively, nearly doubling the performance of handcrafted baseline attacks. Moreover, AGENTVIGIL exhibits strong transferability across unseen tasks and internal LLMs, as well as promising results against defenses. Beyondbenchmark evaluations, we apply our attacks in real-world environments, successfully misleading agents to navigate to arbitrary URLs,including malicious sites.

2018

pdf bib
Encoding Sentiment Information into Word Vectors for Sentiment Analysis
Zhe Ye | Fang Li | Timothy Baldwin
Proceedings of the 27th International Conference on Computational Linguistics

General-purpose pre-trained word embeddings have become a mainstay of natural language processing, and more recently, methods have been proposed to encode external knowledge into word embeddings to benefit specific downstream tasks. The goal of this paper is to encode sentiment knowledge into pre-trained word vectors to improve the performance of sentiment analysis. Our proposed method is based on a convolutional neural network (CNN) and an external sentiment lexicon. Experiments on four popular sentiment analysis datasets show that this method improves the accuracy of sentiment analysis compared to a number of benchmark methods.