Charles Chang


2026

We introduce CAPC-CG, the Chinese Adaptive Policy Communication (Central Government) Corpus, the first open dataset of Chinese policy directives annotated with a five-color typology of policy signals, capturing clarity and ambiguity, grounded in the theory of adaptive policy communication. Spanning 1949–2023, this corpus includes laws, regulations, and rules issued by Chinese central authorities, segmented into 3.3 million paragraph units. We further propose and validate an expert-directed LLM annotation method that integrates codebook design, structured training, a two-step workflow, and LLM-based scaling. Alongside the corpus, we release metadata and a gold-standard labeled set developed by trained coders. Inter-annotator agreement achieves a Fleiss’ kappa of κ = 0.86 on directive labels, indicating high reliability. We provide baseline classification results with several large language models (LLMs), together with our codebook, and describe patterns from the data. This release enables downstream tasks and multilingual NLP research in communication strategies under complexity and uncertainty.