Daizong Ding

2026

Autonomous GUI agents are inherently vulnerable to Environmental Injection Attacks (EIAs). However, existing red-teaming methods face a trade-off between requiring target-specific knowledge and incurring prohibitive computational costs. More fundamentally, a key question remains: what factors determine attack success? To answer this, we first analyze two dimensions: visual appearance (e.g., position, size, color) and semantic content. We find that semantic content dominates, while visual variations have negligible impact. Leveraging this insight, we introduce EVA, a framework that evolves payloads exclusively on the semantic dimension via a discovery-deployment pipeline. Experiments demonstrate that EVA significantly outperforms baselines, achieving 59% to 85% average Attack Success Rate (ASR) while evolving benign seeds into successful attacks within 1.18 to 1.71 iterations. This rapid convergence suggests a dense semantic attack space within the model’s latent space. Whenever an input falls into this space, the agent becomes inherently vulnerable, exposing a fundamental alignment flaw in current multimodal representations.

2025

pdf bib abs

Graphical user interface (GUI) agents powered by multimodal large language models (MLLMs) have shown greater promise for human-interaction. However, due to the high fine-tuning cost, users often rely on open-source GUI agents or APIs offered by AI providers, which introduces a critical but underexplored supply chain threat: backdoor attacks. In this work, we first unveil that MLLM-powered GUI agents naturally expose multiple interaction-level triggers, such as historical steps, environment states, and task progress. Based on this observation, we introduce AgentGhost, an effective and stealthy framework for red-teaming backdoor attacks. Specifically, we first construct composite triggers by combining goal and interaction levels, allowing GUI agents to unintentionally activate backdoors while ensuring task utility. Then, we formulate backdoor injection as a Min-Max optimization problem that uses supervised contrastive learning to maximize the feature difference across sample classes at the representation space, improving flexibility of the backdoor. Meanwhile, it adopts supervised fine-tuning to minimize the discrepancy between backdoor and clean behavior, enhancing effectiveness and utility. Extensive results show that AgentGhost is effective and generic, with attack accuracy that reaches 99.7% on three attack objectives, and shows stealthiness with only 1% utility degradation. Furthermore, we tailor a defense method against AgentGhost that reduces the attack accuracy to 22.1%.

Co-authors

Venues

Findings2

Fix author