SafeAgent: Safeguarding LLM Agents via an Automated Risk Simulator

Xueyang Zhou; Weidong Wang; Lin Lu; Jiawen Shi; Guiyao Tie; Xu Yongtian; Lixing Chen; Pan Zhou; Neil Zhenqiang Gong; Lichao Sun

SafeAgent: Safeguarding LLM Agents via an Automated Risk Simulator

Xueyang Zhou, Weidong Wang, Lin Lu, Jiawen Shi, Guiyao Tie, Xu Yongtian, Lixing Chen, Pan Zhou, Neil Zhenqiang Gong, Lichao Sun

Abstract

LLM-based agents are rapidly being deployed in real-world applications (e.g., digital assistants and customer service), making safety a critical concern. However, in multi-turn, tool-augmented settings, dynamic user interactions, external tool use, and unintended harmful behaviors make robust safety assurance challenging. To address these challenges, we propose **SafeAgent**, a framework that improves agent safety through fully automated synthetic data generation. SafeAgent introduces (1) an open and extensible threat model OTS that decomposes agent risk into instruction-, context-, and action-induced sources to ground safety analysis and alignment; and (2) an automated pipeline that instantiates OTS to surface scenario-specific failure modes, stress-test agents, and generate self-reflective safe responses—without hazardous real-world data collection. We evaluate SafeAgent on two safety benchmarks and one real-world terminal task. Across four widely used open-source models, SafeAgent improves safety performance by 45% on average and delivers a 28.91% gain on the real-world task, outperforming state-of-the-art closed-source models. These results highlight the practical advancement and scalability of SafeAgent in building safer LLM agents for real-world deployment.

Anthology ID:: 2026.acl-long.1501
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 32516–32543
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1501/
DOI:
Bibkey:
Cite (ACL):: Xueyang Zhou, Weidong Wang, Lin Lu, Jiawen Shi, Guiyao Tie, Xu Yongtian, Lixing Chen, Pan Zhou, Neil Zhenqiang Gong, and Lichao Sun. 2026. SafeAgent: Safeguarding LLM Agents via an Automated Risk Simulator. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 32516–32543, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: SafeAgent: Safeguarding LLM Agents via an Automated Risk Simulator (Zhou et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1501.pdf
Checklist:: 2026.acl-long.1501.checklist.pdf

PDF Cite Search Checklist Fix data