Simple Role Assignment is Extraordinarily Effective for Safety Alignment

Zhou Ziheng; Jiakun Ding; Zhaowei Zhang; Ruosen Gao; Ying Nian Wu; Demetri Terzopoulos; Yipeng Kang; Fangwei Zhong; Junqi Wang

Simple Role Assignment is Extraordinarily Effective for Safety Alignment

Zhou Ziheng, Jiakun Ding, Zhaowei Zhang, Ruosen Gao, Ying Nian Wu, Demetri Terzopoulos, Yipeng Kang, Fangwei Zhong, Junqi Wang

Abstract

Principle-based alignment often lacks context sensitivity and completeness. Grounded in Theory of Mind, we propose role conditioning as a compact alternative: social roles (e.g., mother, judge) implicitly encode both values and the cognitive schemas required to apply them. We introduce a training-free pipeline featuring a role-conditioned generator and iterative role-based critics for refinement. Across five model families, our approach consistently outperforms principle-based, Chain-of-Thought (CoT) and other baselines across benchmarks. Notably, it reduces unsafe outputs on the WildJailbreak benchmark from 81.4% to 3.6% with DeepSeek-V3. Not only for common safety benchmarks, it consistently applies for agentic safety tasks. These results establish role assignment as a powerful, interpretable paradigm for AI alignment and LLM-as-a-Judge construction.

Anthology ID:: 2026.findings-acl.1164
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 23249–23267
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1164/
DOI:
Bibkey:
Cite (ACL):: Zhou Ziheng, Jiakun Ding, Zhaowei Zhang, Ruosen Gao, Ying Nian Wu, Demetri Terzopoulos, Yipeng Kang, Fangwei Zhong, and Junqi Wang. 2026. Simple Role Assignment is Extraordinarily Effective for Safety Alignment. In Findings of the Association for Computational Linguistics: ACL 2026, pages 23249–23267, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Simple Role Assignment is Extraordinarily Effective for Safety Alignment (Ziheng et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1164.pdf
Checklist:: 2026.findings-acl.1164.checklist.pdf

PDF Cite Search Checklist Fix data