Zhouliang Yu
2026
COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values
Siwei Wu | JinCheng Ren | Xeron Du | Shuyue Guo | Xingwei Qu | Yiming Liang | Jie Liu | Yunwen Li | Tyler Loakman | Tianyu Zheng | Boyu Feng | Huaqing Yuan | Zili Wang | Jiaheng Liu | Wenhao Huang | Chenglin Cai | Haoran Que | Jian Yang | Yuelin Bai | Zekun Moore Wang | Zhouliang Yu | Qunshu Lin | Ding Pan | Yuchen Eleanor Jiang | Tiannan Wang | Wangchunshu Zhou | Shenzhi Wang | Xingyuan Bu | Minghao Liu | Guoyin Wang | Ge Zhang | Chenghua Lin
Findings of the Association for Computational Linguistics: EACL 2026
Siwei Wu | JinCheng Ren | Xeron Du | Shuyue Guo | Xingwei Qu | Yiming Liang | Jie Liu | Yunwen Li | Tyler Loakman | Tianyu Zheng | Boyu Feng | Huaqing Yuan | Zili Wang | Jiaheng Liu | Wenhao Huang | Chenglin Cai | Haoran Que | Jian Yang | Yuelin Bai | Zekun Moore Wang | Zhouliang Yu | Qunshu Lin | Ding Pan | Yuchen Eleanor Jiang | Tiannan Wang | Wangchunshu Zhou | Shenzhi Wang | Xingyuan Bu | Minghao Liu | Guoyin Wang | Ge Zhang | Chenghua Lin
Findings of the Association for Computational Linguistics: EACL 2026
Existing Chinese preference datasets suffer from limited scale, restricted domain coverage, and insufficiently rigorous data validation. Human annotation significantly limits the scalability of human preference datasets. As a result, Chinese Alignment and Chinese Reward Models (CRM) have not yet been thoroughly explored. To address these challenges, we design an LLM-based data annotation pipeline with no human intervention. Based on this pipeline, we curate COIG-P (Chinese Open Instruction Generalist - Preference), a high-quality, large-scale Chinese preference dataset consisting of 1M Chinese preference pairs and 92k carefully curated Chinese queries across diverse domains, including Chat, Coding, Maths, and others. We conduct experiments to verify the quality of COIG-P from two perspectives. (1) COIG-P brings significant performance improvements for the Qwen2/2.5 and Infinity-Instruct model series on AlignBench through DPO, with gains ranging from 2% to 12%. Furthermore, it significantly outperforms other existing Chinese preference datasets. (2) We train an 8B-sized CRM and manually annotate a Chinese Reward Benchmark (CRBench). Our CRM demonstrates robust scoring ability on CRBench. In addition, in practical data construction experiments, the quality of the data constructed by our CRM is comparable to that produced by GPT-4o.
CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization
Zhongyuan Peng | Yifan Yao | Kaijing Ma | Shuyue Guo | Yizhe Li | Yichi Zhang | Chenchen Zhang | Yifan Zhang | Zhouliang Yu | Luming Li | Minghao Liu | Yihang Xia | Jiawei Shen | Yuchen Wu | Yixin Cao | Zhaoxiang Zhang | Wenhao Huang | Jiaheng Liu | Ge Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zhongyuan Peng | Yifan Yao | Kaijing Ma | Shuyue Guo | Yizhe Li | Yichi Zhang | Chenchen Zhang | Yifan Zhang | Zhouliang Yu | Luming Li | Minghao Liu | Yihang Xia | Jiawei Shen | Yuchen Wu | Yixin Cao | Zhaoxiang Zhang | Wenhao Huang | Jiaheng Liu | Ge Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Translating natural language mathematical statements into formal, executable code is a fundamental challenge in automated theorem proving. While prior work has focused on generation and compilation success, little attention has been paid to the critic phase—the evaluation of whether generated formalizations truly capture the semantic intent of the original problem. In this paper, we introduce CriticLean, a novel critic-guided reinforcement learning framework that elevates the role of the critic from a passive validator to an active learning component. Specifically, first, we propose the CriticLeanGPT, trained via supervised fine-tuning and reinforcement learning, to rigorously assess the semantic fidelity of Lean 4 formalizations. Then, we introduce CriticLeanBench, a benchmark designed to measure models’ ability to distinguish semantically correct from incorrect formalizations, and demonstrate that our trained CriticLeanGPT models can significantly outperform strong open- and closed-source baselines. Building on the CriticLean framework, we construct FineLeanCorpus, a dataset comprising over 509K problems that exhibits rich domain diversity, broad difficulty coverage, and high correctness based on human evaluation.Overall, our findings highlight that optimizing the critic phase is essential for producing reliable formalizations and we hope our CriticLean will provide valuable insights for future advances in formal mathematical reasoning.
Search
Fix author
Co-authors
- Shuyue Guo 2
- Wenhao Huang 2
- Jiaheng Liu 2
- Minghao Liu 2
- Ge Zhang 2
- Yuelin Bai 1
- Xingyuan Bu 1
- Chenglin Cai 1
- Yixin Cao 1
- Xeron Du 1
- Boyu Feng 1
- Yuchen Eleanor Jiang 1
- Luming Li 1
- Yizhe Li 1
- Yunwen Li 1
- Yiming Liang 1
- Chenghua Lin 1
- Qunshu Lin 1
- Jie Liu 1
- Tyler Loakman 1
- Kaijing Ma 1
- Ding Pan 1
- Zhongyuan Peng 1
- Xingwei Qu 1
- Haoran Que 1
- JinCheng Ren 1
- Jiawei Shen 1
- Guoyin Wang 1
- Shenzhi Wang 1
- Tiannan Wang 1
- Zekun Moore Wang 1
- Zili Wang 1
- Siwei Wu 1
- Yuchen Wu 1
- Yihang Xia 1
- Jian Yang 1
- Yifan Yao 1
- Huaqing Yuan 1
- Chenchen Zhang 1
- Yichi Zhang 1
- Yifan Zhang 1
- Zhaoxiang Zhang 1
- Tianyu Zheng 1
- Wangchunshu Zhou 1