Ruibin Xiong
2026
Reinforcement Learning on Pre-Training Data
Siheng Li | Kejiao Li | Zenan Xu | Guanhua Huang | Kun Li | Haoyuan Wu | Wujiajia | Zihao Zheng | Chenchen Zhang | Kun Shi | Xue Gong | Qi Yi | Ruibin Xiong | Tingqiang Xu | Yuhao Jiang | Jianfeng Yan | Yuyuan Zeng | Guanghui Xu | Jinbao Xue | Zhijiang xu | Zheng Fang | Shuai LI | Qibin Liu | Xiaoxue Li | Zhuoyu Li | Yangyu Tao | Fei Gao | Cheng Jiang | Bochao Wang | Kai Liu | Jianchen Zhu | Wai Lam | Bo Zhou | Di Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Siheng Li | Kejiao Li | Zenan Xu | Guanhua Huang | Kun Li | Haoyuan Wu | Wujiajia | Zihao Zheng | Chenchen Zhang | Kun Shi | Xue Gong | Qi Yi | Ruibin Xiong | Tingqiang Xu | Yuhao Jiang | Jianfeng Yan | Yuyuan Zeng | Guanghui Xu | Jinbao Xue | Zhijiang xu | Zheng Fang | Shuai LI | Qibin Liu | Xiaoxue Li | Zhuoyu Li | Yangyu Tao | Fei Gao | Cheng Jiang | Bochao Wang | Kai Liu | Jianchen Zhu | Wai Lam | Bo Zhou | Di Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Recent progress in large language models (LLMs) is largely driven by scaling training compute through either pre-training with next-token prediction (NTP) or post-training with reinforcement learning (RL). The former contributes to learning broad knowledge and skills from general data, while struggling with data inefficiency and catastrophic forgetting in continual learning settings. The latter incentivizes reasoning capabilities with strong generalization, but is constrained by limited data availability due to its reliance on human annotation. To alleviate these issues, we propose Reinforcement Learning on Pre-Training data (RLPT), which combines the advantages of learning from general data and RL. In particular, RLPT derives reward signals directly from general text data through a next-segment reasoning objective, rewarding the policy for correctly predicting next text segments conditioned on the prefix text. Experiments across multiple benchmarks and models demonstrate the effectiveness of . For example, RLPT yields substantial improvements in continual pre-training (+4.6%) and provides a strong foundation for post-training (+3.4%) on Qwen3-8B-Base.
2025
Beyond Outlining: Heterogeneous Recursive Planning for Adaptive Long-form Writing with Language Models
Ruibin Xiong | Yimeng Chen | Dmitrii Khizbullin | Mingchen Zhuge | Jürgen Schmidhuber
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Ruibin Xiong | Yimeng Chen | Dmitrii Khizbullin | Mingchen Zhuge | Jürgen Schmidhuber
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Long-form writing agents require flexible integration and interaction across information retrieval, reasoning, and composition. Current approaches rely on predefined workflows and rigid thinking patterns to generate outlines before writing, resulting in constrained adaptability during writing. In this paper we propose WriteHERE, a general agent framework that achieves human-like adaptive writing through recursive task decomposition and dynamic integration of three fundamental task types: retrieval, reasoning, and composition. Our methodology features: 1) a planning mechanism that interleaves recursive task decomposition and execution, eliminating artificial restrictions on writing workflow; and 2) integration of task types that facilitates heterogeneous task decomposition. Evaluations on both fiction writing and technical report generation show that our method consistently outperforms state-of-the-art approaches across all automatic evaluation metrics, demonstrating the effectiveness and broad applicability of our proposed framework. We have publicly released our code and prompts to facilitate further research.
Search
Fix author
Co-authors
- Yimeng Chen 1
- Zheng Fang 1
- Fei Gao 1
- Xue Gong 1
- Guanhua Huang 1
- Cheng Jiang 1
- Yuhao Jiang 1
- Dmitrii Khizbullin 1
- Shuai LI 1
- Wai Lam 1
- Kejiao Li 1
- Kun Li 1
- Siheng Li 1
- Xiaoxue Li 1
- Zhuoyu Li 1
- Kai Liu 1
- Qibin Liu 1
- Jürgen Schmidhuber 1
- Kun Shi 1
- Yangyu Tao 1
- Bochao Wang 1
- Di Wang 1
- Haoyuan Wu 1
- Wujiajia 1
- Guanghui Xu 1
- Tingqiang Xu 1
- Zenan Xu 1
- Jinbao Xue 1
- Jianfeng Yan 1
- Qi Yi 1
- Yuyuan Zeng 1
- Chenchen Zhang 1
- Zihao Zheng 1
- Bo Zhou 1
- Jianchen Zhu 1
- Mingchen Zhuge 1
- Zhijiang xu 1