Chenchen Zhang
2026
Rhombus: Incentivizing Coordination in Parallel Thinking through Reinforcement Learning
Ziyuan Nan | Qi Yi | Di Huang | Yutong Wu | Guanhua Huang | Xue Gong | Kejiao Li | Yuhao Jiang | Chenchen Zhang | Zenan Xu | Xing Hu | Bo Zhou
Findings of the Association for Computational Linguistics: ACL 2026
Ziyuan Nan | Qi Yi | Di Huang | Yutong Wu | Guanhua Huang | Xue Gong | Kejiao Li | Yuhao Jiang | Chenchen Zhang | Zenan Xu | Xing Hu | Bo Zhou
Findings of the Association for Computational Linguistics: ACL 2026
Parallel thinking offers a promising avenue for scaling test-time compute in Large Language Models (LLMs), enabling them to explore diverse solution paths simultaneously before aggregating them into a final answer. However, coordinating the exploration and aggregation stages remains challenging, as simple aggregation techniques often incur information loss, failing to preserve the subtle, decision-relevant signals generated during exploration. To overcome this, we propose Rhombus, a parallel thinking framework that explicitly incentivizes coordination between components via end-to-end reinforcement learning. Rhombus employs multiple parallel Proposers to generate compact, decision-focused reasoning cues and a central Synthesizer to integrate them into final predictions, utilizing co-training under a shared task reward to align their interaction. Across challenging mathematical reasoning benchmarks, Rhombus improves accuracy by 6.0% over long chain-of-thought baselines while reducing wall-clock latency by 39.4% under matched token budgets. Our work demonstrates that explicit communication optimization is essential for realizing the accuracy and efficiency gains of parallel reasoning.
Reinforcement Learning on Pre-Training Data
Siheng Li | Kejiao Li | Zenan Xu | Guanhua Huang | Kun Li | Haoyuan Wu | Wujiajia | Zihao Zheng | Chenchen Zhang | Kun Shi | Xue Gong | Qi Yi | Ruibin Xiong | Tingqiang Xu | Yuhao Jiang | Jianfeng Yan | Yuyuan Zeng | Guanghui Xu | Jinbao Xue | Zhijiang xu | Zheng Fang | Shuai LI | Qibin Liu | Xiaoxue Li | Zhuoyu Li | Yangyu Tao | Fei Gao | Cheng Jiang | Bochao Wang | Kai Liu | Jianchen Zhu | Wai Lam | Bo Zhou | Di Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Siheng Li | Kejiao Li | Zenan Xu | Guanhua Huang | Kun Li | Haoyuan Wu | Wujiajia | Zihao Zheng | Chenchen Zhang | Kun Shi | Xue Gong | Qi Yi | Ruibin Xiong | Tingqiang Xu | Yuhao Jiang | Jianfeng Yan | Yuyuan Zeng | Guanghui Xu | Jinbao Xue | Zhijiang xu | Zheng Fang | Shuai LI | Qibin Liu | Xiaoxue Li | Zhuoyu Li | Yangyu Tao | Fei Gao | Cheng Jiang | Bochao Wang | Kai Liu | Jianchen Zhu | Wai Lam | Bo Zhou | Di Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Recent progress in large language models (LLMs) is largely driven by scaling training compute through either pre-training with next-token prediction (NTP) or post-training with reinforcement learning (RL). The former contributes to learning broad knowledge and skills from general data, while struggling with data inefficiency and catastrophic forgetting in continual learning settings. The latter incentivizes reasoning capabilities with strong generalization, but is constrained by limited data availability due to its reliance on human annotation. To alleviate these issues, we propose Reinforcement Learning on Pre-Training data (RLPT), which combines the advantages of learning from general data and RL. In particular, RLPT derives reward signals directly from general text data through a next-segment reasoning objective, rewarding the policy for correctly predicting next text segments conditioned on the prefix text. Experiments across multiple benchmarks and models demonstrate the effectiveness of . For example, RLPT yields substantial improvements in continual pre-training (+4.6%) and provides a strong foundation for post-training (+3.4%) on Qwen3-8B-Base.
ReLook: Vision-Grounded RL with a Multimodal LLM Critic for Agentic Web Coding
Yuhang Li | Chenchen Zhang | Ruilin Lv | Ao Liu | Ken Deng | Yuanxing Zhang | Jiaheng Liu | Bo Zhou
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yuhang Li | Chenchen Zhang | Ruilin Lv | Ao Liu | Ken Deng | Yuanxing Zhang | Jiaheng Liu | Bo Zhou
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
While Large Language Models (LLMs) excel at algorithmic code generation, they struggle with front-end development, where correctness is judged on rendered pixels and interaction. We present ReLook, an agentic, vision-grounded reinforcement learning framework that empowers an agent to close a robust generate–diagnose–refine loop by invoking a multimodal LLM (MLLM) as a tool. During training, the agent employs an MLLM-in-the-loop to serve as a visual critic, evaluating code via screenshots and providing actionable feedback. Crucially, we enforce a strict zero-reward policy for invalid renders to guarantee renderability and mitigate reward hacking. To prevent behavioral collapse, we introduce Forced Optimization, a strict acceptance rule that admits only improving revisions, yielding monotonically better trajectories. At inference, we decouple the critic and run a lightweight, critic-free self-edit cycle, keeping latency comparable to base decoding while retaining most of the gains. Across three widely used benchmarks, ReLook consistently outperforms strong baselines in vision-grounded front-end code generation, highlighting the benefits of agentic perception, visual rewards, and training–inference decoupling.
CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization
Zhongyuan Peng | Yifan Yao | Kaijing Ma | Shuyue Guo | Yizhe Li | Yichi Zhang | Chenchen Zhang | Yifan Zhang | Zhouliang Yu | Luming Li | Minghao Liu | Yihang Xia | Jiawei Shen | Yuchen Wu | Yixin Cao | Zhaoxiang Zhang | Wenhao Huang | Jiaheng Liu | Ge Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zhongyuan Peng | Yifan Yao | Kaijing Ma | Shuyue Guo | Yizhe Li | Yichi Zhang | Chenchen Zhang | Yifan Zhang | Zhouliang Yu | Luming Li | Minghao Liu | Yihang Xia | Jiawei Shen | Yuchen Wu | Yixin Cao | Zhaoxiang Zhang | Wenhao Huang | Jiaheng Liu | Ge Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Translating natural language mathematical statements into formal, executable code is a fundamental challenge in automated theorem proving. While prior work has focused on generation and compilation success, little attention has been paid to the critic phase—the evaluation of whether generated formalizations truly capture the semantic intent of the original problem. In this paper, we introduce CriticLean, a novel critic-guided reinforcement learning framework that elevates the role of the critic from a passive validator to an active learning component. Specifically, first, we propose the CriticLeanGPT, trained via supervised fine-tuning and reinforcement learning, to rigorously assess the semantic fidelity of Lean 4 formalizations. Then, we introduce CriticLeanBench, a benchmark designed to measure models’ ability to distinguish semantically correct from incorrect formalizations, and demonstrate that our trained CriticLeanGPT models can significantly outperform strong open- and closed-source baselines. Building on the CriticLean framework, we construct FineLeanCorpus, a dataset comprising over 509K problems that exhibits rich domain diversity, broad difficulty coverage, and high correctness based on human evaluation.Overall, our findings highlight that optimizing the critic phase is essential for producing reliable formalizations and we hope our CriticLean will provide valuable insights for future advances in formal mathematical reasoning.
2025
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
Siming Huang | Tianhao Cheng | Jason Klein Liu | Weidi Xu | Jiaran Hao | Liuyihan Song | Yang Xu | Jian Yang | Jiaheng Liu | Chenchen Zhang | Linzheng Chai | Ruifeng Yuan | Xianzhen Luo | Qiufeng Wang | YuanTao Fan | Qingfu Zhu | Zhaoxiang Zhang | Yang Gao | Jie Fu | Qian Liu | Houyi Li | Ge Zhang | Yuan Qi | Xu Yinghui | Wei Chu | Zili Wang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Siming Huang | Tianhao Cheng | Jason Klein Liu | Weidi Xu | Jiaran Hao | Liuyihan Song | Yang Xu | Jian Yang | Jiaheng Liu | Chenchen Zhang | Linzheng Chai | Ruifeng Yuan | Xianzhen Luo | Qiufeng Wang | YuanTao Fan | Qingfu Zhu | Zhaoxiang Zhang | Yang Gao | Jie Fu | Qian Liu | Houyi Li | Ge Zhang | Yuan Qi | Xu Yinghui | Wei Chu | Zili Wang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Code LLMs have been widely used in various domains, including code generation, logical reasoning, and agent systems. However, open-access code LLMs mostly only release weights, lacking key features such as reproducible data pipelines and transparent training protocols, which are crucial for advancing deeper, more reliable investigations. To address the gap, we introduce OpenCoder, a top-tier code LLM that not only achieves performance comparable to leading models but also serves as an “open cookbook” for the research community. Unlike most prior efforts, we release not only model weights and inference code, but also the reproducible training data, complete data processing pipeline, rigorous experimental ablation results, and detailed training protocols for open scientific research. Our work identifies the key ingredients for building a top-tier code LLM: optimized heuristic rules for data cleaning and deduplication, effective recall of code-related text corpus, and high-quality synthetic data for both annealing and supervised fine-tuning stages. By offering this level of openness, we aim to broaden access to all aspects of a top-tier code LLM, with OpenCoder serving as both a powerful model and an open foundation to accelerate research and enable reproducible advancements in code intelligence. The released resource is available at https://opencoder-llm.github.io.
2024
E2-LLM: Efficient and Extreme Length Extension of Large Language Models
Jiaheng Liu | Zhiqi Bai | Yuanxing Zhang | Chenchen Zhang | Yu Zhang | Ge Zhang | Jiakai Wang | Haoran Que | Yukang Chen | Wenbo Su | Tiezheng Ge | Jie Fu | Wenhu Chen | Bo Zheng
Findings of the Association for Computational Linguistics: ACL 2024
Jiaheng Liu | Zhiqi Bai | Yuanxing Zhang | Chenchen Zhang | Yu Zhang | Ge Zhang | Jiakai Wang | Haoran Que | Yukang Chen | Wenbo Su | Tiezheng Ge | Jie Fu | Wenhu Chen | Bo Zheng
Findings of the Association for Computational Linguistics: ACL 2024
Training Large Language Models (LLMs) to process extensive context lengths incurs prohibitive computational costs. Prevailing techniques for extending context capabilities in LLMs typically require not only additional training procedures but also access to datasets with long context (e.g., sequences of 32K tokens), presupposing substantial GPU expenditures. To address the aforementioned issues, we introduce a novel solution named Efficient and Extreme length extension for Large Language Models (E2-LLM). E2-LLM entails a singular training process over considerably short sequences (e.g., 4K tokens), which greatly mitigates the cost of continual-pretraining or fine-tuning. Within the training phase, we incorporate a dual augmentation strategy with Rotary Position Embeddings (RoPE) that adjusts the scale and position indices across distinct training samples. E 2 -LLM is meticulously designed to enhance the model’s robustness to diverse relative positions. The experimental results on multiple benchmark datasets demonstrate the superior performance of E 2 -LLM on demanding tasks of processing long contexts.
ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models
Yanan Wu | Jie Liu | Xingyuan Bu | Jiaheng Liu | Zhanhui Zhou | Yuanxing Zhang | Chenchen Zhang | Zhiqi Bai | Haibin Chen | Tiezheng Ge | Wanli Ouyang | Wenbo Su | Bo Zheng
Findings of the Association for Computational Linguistics: ACL 2024
Yanan Wu | Jie Liu | Xingyuan Bu | Jiaheng Liu | Zhanhui Zhou | Yuanxing Zhang | Chenchen Zhang | Zhiqi Bai | Haibin Chen | Tiezheng Ge | Wanli Ouyang | Wenbo Su | Bo Zheng
Findings of the Association for Computational Linguistics: ACL 2024
This paper introduces ConceptMath, a bilingual (English and Chinese), fine-grained benchmark that evaluates concept-wise mathematical reasoning of Large Language Models (LLMs). Unlike traditional benchmarks that evaluate general mathematical reasoning with an average accuracy, ConceptMath systemically organizes math problems under a hierarchy of math concepts, so that mathematical reasoning can be evaluated at different granularity with concept-wise accuracies. Based on our ConcepthMath, we then evaluate a broad range of LLMs, and we observe existing LLMs, though achieving high average accuracies on traditional benchmarks, exhibit significant performance variations across different math concepts and may even fail catastrophically on the most basic ones. Besides, we also introduce an efficient fine-tuning strategy to enhance the weaknesses of existing LLMs. Finally, we hope ConceptMath could guide the developers to understand the fine-grained mathematical abilities of their models and facilitate the growth of foundation models. Code is available at https://github.com/conceptmath/conceptmath.
Search
Fix author
Co-authors
- Jiaheng Liu 5
- Yuanxing Zhang 3
- Ge Zhang 3
- Bo Zhou 3
- Zhiqi Bai 2
- Jie Fu 2
- Tiezheng Ge 2
- Xue Gong 2
- Guanhua Huang 2
- Yuhao Jiang 2
- Kejiao Li 2
- Wenbo Su 2
- Zenan Xu 2
- Qi Yi 2
- Zhaoxiang Zhang 2
- Bo Zheng 2
- Xingyuan Bu 1
- Yixin Cao 1
- Linzheng Chai 1
- Yukang Chen 1
- Wenhu Chen 1
- Haibin Chen 1
- Tianhao Cheng 1
- Wei Chu 1
- Ken Deng 1
- Yuantao Fan 1
- Zheng Fang 1
- Fei Gao 1
- Yang Gao 1
- Shuyue Guo 1
- Jiaran Hao 1
- Xing Hu 1
- Di Huang 1
- Siming Huang 1
- Wenhao Huang 1
- Cheng Jiang 1
- Shuai LI 1
- Wai Lam 1
- Siheng Li 1
- Kun Li 1
- Xiaoxue Li 1
- Zhuoyu Li 1
- Houyi Li 1
- Yuhang Li 1
- Yizhe Li 1
- Luming Li 1
- Qibin Liu 1
- Kai Liu 1
- Jie Liu 1
- Jason Klein Liu 1
- Qian Liu 1
- Ao Liu 1
- Minghao Liu 1
- Xianzhen Luo 1
- Ruilin Lv 1
- Kaijing Ma 1
- Ziyuan Nan 1
- Wanli Ouyang 1
- Zhongyuan Peng 1
- Yuan Qi 1
- Haoran Que 1
- Jiawei Shen 1
- Kun Shi 1
- Liuyihan Song 1
- Yangyu Tao 1
- Jiakai Wang 1
- Bochao Wang 1
- Di Wang 1
- Qiufeng Wang 1
- Zili Wang 1
- Yutong Wu 1
- Haoyuan Wu 1
- Yanan Wu 1
- Yuchen Wu 1
- Wujiajia 1
- Yihang Xia 1
- Ruibin Xiong 1
- Tingqiang Xu 1
- Guanghui Xu 1
- Weidi Xu 1
- Yang Xu 1
- Jinbao Xue 1
- Jianfeng Yan 1
- Jian Yang 1
- Yifan Yao 1
- Xu Yinghui 1
- Zhouliang Yu 1
- Ruifeng Yuan 1
- Yuyuan Zeng 1
- Yu Zhang 1
- Yichi Zhang 1
- Yifan Zhang 1
- Zihao Zheng 1
- Zhanhui Zhou 1
- Jianchen Zhu 1
- Qingfu Zhu 1
- Zhijiang xu 1