Governance in Motion: Co-evolution of Constitutions and AI models for Scalable Safety

Chenhao Huang, Ziyu Shen, Yicong Ren, Huiyuan Zheng, Jiazheng Zhang, Mingxu Chai, Ming Zhang, Shihan Dou, Fan Mo, Jie Shi, Tao Gui, Qi Zhang, Xuanjing Huang


Abstract
Aligning large language models (LLMs) with human preferences is a central challenge for building reliable AI systems. Most existing alignment approaches rely on static signals, such as predefined principles or offline human annotations to guide model behavior toward a fixed approximation of human preferences. However, LLMs can exhibit distributional drift during training, and static alignment mechanisms lack the capacity to adaptively correct misaligned behaviors as they emerge. To address this limitation, we develop a two-stage framework that enables dynamic and continuous alignment. In the first stage, a constitution is continually revised based on observed model behaviors, and models are trained to comply with these evolving principles. In the second stage, this learned constitution is used to guide reinforcement learning, encouraging the model to align with the updated normative signals. We refer to this framework as COCOA: Co-evolution of Constitutions and AI Models. We show that COCOA enables a 7B model to greatly improve safety—raising StrongReject score from 0.741 to 0.935 and Safe-RLHF accuracy from 77.76% to 90.64% without human annotations, reaching performance close to much larger state-of-the-art models.
Anthology ID:
2025.emnlp-main.869
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
17198–17221
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.869/
DOI:
Bibkey:
Cite (ACL):
Chenhao Huang, Ziyu Shen, Yicong Ren, Huiyuan Zheng, Jiazheng Zhang, Mingxu Chai, Ming Zhang, Shihan Dou, Fan Mo, Jie Shi, Tao Gui, Qi Zhang, and Xuanjing Huang. 2025. Governance in Motion: Co-evolution of Constitutions and AI models for Scalable Safety. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 17198–17221, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Governance in Motion: Co-evolution of Constitutions and AI models for Scalable Safety (Huang et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.869.pdf
Checklist:
 2025.emnlp-main.869.checklist.pdf