Pengyu Zhu

2026

We propose a novel data synthesis framework, DecIF, which automatically generates accurate and diverse instruction-following data from scratch for supervised fine-tuning (SFT) and reinforcement learning (RL), leveraging large language models (LLMs) and minimal external resources. By decomposing the data synthesis pipeline into fine-grained steps, DecIF achieves meticulous quality and diversity control over generated instruction-following data. Extensive experiments across both SFT and RL demonstrate DecIF’s strong capability to flexibly synthesize accurate instruction-following data for both paradigms compared to comprehensive baselines. Further analysis demonstrates the framework’s robustness, scalability, and computational efficiency in instruction-following data generation, while its modular design ensures straightforward implementation and reproducibility.

2025

pdf bib abs

As LLM-based agents become increasingly prevalent, triggers implanted in user queries or environment feedback can activate hidden backdoors, raising critical concerns about safety vulnerabilities in agents.However, traditional backdoor attacks are often detectable by safety audits that analyze the reasoning process of agents, hindering further progress in agent safety research.To this end, we propose a novel backdoor implantation strategy called Dynamically Encrypted Multi-Backdoor Implantation Attack. Specifically, we introduce dynamic encryption, which maps the backdoor into benign content, effectively circumventing safety audits.To enhance stealthiness, we further decompose the backdoor into multiple sub-backdoor fragments. Based on these advancements, backdoors are allowed to bypass safety audits significantly.Additionally, we present AgentBackdoorEval, a dataset designed for the comprehensive evaluation of agent backdoor attacks.Experimental results across multiple datasets demonstrate that our method achieves an attack success rate approaching 100% while maintaining a detection rate of 0%, illustrating its effectiveness in evading safety audits.Our findings highlight the limitations of existing safety mechanisms in detecting advanced attacks, underscoring the urgent need for more robust defenses against backdoor threats.Code and data are available at https://github.com/whfeLingYu/DemonAgent.

Co-authors

Venues

ACL1
Findings1

Fix author