Pengyu Zhu
2026
DecIF: Improving Instruction-Following through Decomposition
Tingfeng Hui | Pengyu Zhu | Bowen Ping | Ling Tang | Guanting Dong | Yaqi Zhang | Sen Su
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Tingfeng Hui | Pengyu Zhu | Bowen Ping | Ling Tang | Guanting Dong | Yaqi Zhang | Sen Su
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
We propose a novel data synthesis framework, DecIF, which automatically generates accurate and diverse instruction-following data from scratch for supervised fine-tuning (SFT) and reinforcement learning (RL), leveraging large language models (LLMs) and minimal external resources. By decomposing the data synthesis pipeline into fine-grained steps, DecIF achieves meticulous quality and diversity control over generated instruction-following data. Extensive experiments across both SFT and RL demonstrate DecIF’s strong capability to flexibly synthesize accurate instruction-following data for both paradigms compared to comprehensive baselines. Further analysis demonstrates the framework’s robustness, scalability, and computational efficiency in instruction-following data generation, while its modular design ensures straightforward implementation and reproducibility.
2025
DemonAgent: Dynamically Encrypted Multi-Backdoor Implantation Attack on LLM-based Agent
Pengyu Zhu | Zhenhong Zhou | Yuanhe Zhang | Shilinlu Yan | Kun Wang | Sen Su
Findings of the Association for Computational Linguistics: EMNLP 2025
Pengyu Zhu | Zhenhong Zhou | Yuanhe Zhang | Shilinlu Yan | Kun Wang | Sen Su
Findings of the Association for Computational Linguistics: EMNLP 2025
As LLM-based agents become increasingly prevalent, triggers implanted in user queries or environment feedback can activate hidden backdoors, raising critical concerns about safety vulnerabilities in agents.However, traditional backdoor attacks are often detectable by safety audits that analyze the reasoning process of agents, hindering further progress in agent safety research.To this end, we propose a novel backdoor implantation strategy called Dynamically Encrypted Multi-Backdoor Implantation Attack. Specifically, we introduce dynamic encryption, which maps the backdoor into benign content, effectively circumventing safety audits.To enhance stealthiness, we further decompose the backdoor into multiple sub-backdoor fragments. Based on these advancements, backdoors are allowed to bypass safety audits significantly.Additionally, we present AgentBackdoorEval, a dataset designed for the comprehensive evaluation of agent backdoor attacks.Experimental results across multiple datasets demonstrate that our method achieves an attack success rate approaching 100% while maintaining a detection rate of 0%, illustrating its effectiveness in evading safety audits.Our findings highlight the limitations of existing safety mechanisms in detecting advanced attacks, underscoring the urgent need for more robust defenses against backdoor threats.Code and data are available at https://github.com/whfeLingYu/DemonAgent.