Yanming Guo

2026

Large language model (LLM) agents execute tasks through multi-step workflows that combine planning, memory, and tool use. While this design enables autonomy, it also expands the attack surface for backdoor threats. Backdoor triggers injected into specific stages of an agent workflow can persist through multiple intermediate states and adversely influence downstream outputs. However, existing studies remain fragmented and typically analyze individual attack vectors in isolation, leaving the cross-stage interaction and propagation of backdoor triggers poorly understood from an agent-centric perspective.To fill this gap, we propose BackdoorAgent, a modular and stage-aware framework that provides a unified, agent-centric view of backdoor threats in LLM agents. BackdoorAgent structures the attack surface into three functional stages of agentic workflows, including planning attacks, memory attacks, and tool-use attacks, and instruments agent execution to enable systematic analysis of trigger activation and propagation across different stages.Building on this framework, we construct a standardized benchmark spanning four representative agent applications: Agent QA, Agent Code, Agent Web, and Agent Drive, covering both language-only and multimodal settings. Our empirical analysis shows that triggers implanted at a single stage can persist across multiple steps and propagate through intermediate states. For instance, when using a GPT-based backbone, we observe trigger persistence in 43.58% of planning attacks, 77.97% of memory attacks, and 60.28% of tool-stage attacks, highlighting the vulnerabilities of the agentic workflow itself to backdoor threats. Our code is available at https://github.com/Yunhao-Feng/BackdoorAgent.

2025

pdf bib abs

State-of-the-art vision-language models (VLMs) require massive scaling that limits practical deployment. Small-scale VLMs offer a practical alternative but face out-of-domain (OOD) collapse when trained with traditional supervised fine-tuning (SFT). Through GeneralPoints experiments, we identify that OOD collapse is due to SFT’s tendency to induce visual hallucinations under distribution shifts, whereas Reinforcement Learning’s (RL) bidirectional reward-driven mechanism with iterative error correction refines visual perception. Although RL-based post-training effectively mitigates OOD degradation, it faces a critical sparse reward dilemma in complex visual reasoning tasks. To this end, we propose Curriculum Reinforcement Finetuning (Curr-ReFT), comprising two sequential stages: (1) Structured Curriculum Reinforcement Learning, which progressively evolves task formats and reward functions to match models’ growing capabilities; and (2) Rejected Sampling-based Self-improvement, which maintains the fundamental capabilities of VLMs through selective learning from high-quality examples. Extensive experiments demonstrate that Curr-ReFT achieves state-of-the-art performance across various visual tasks in both in- and out-of-domain settings and benchmarks.

Co-authors

Yu Kang 1

Yige Li 1

Rui Ma 1

Venues

Findings2

Fix author