Thinking with DistilQwen: A Tale of Four Distilled Reasoning and Reward Model Series

Wenrui Cai; Chengyu Wang; Junbing Yan; Jun Huang; Xiangzhong Fang

Thinking with DistilQwen: A Tale of Four Distilled Reasoning and Reward Model Series

Wenrui Cai, Chengyu Wang, Junbing Yan, Jun Huang, Xiangzhong Fang

Abstract

Recently, the demand for small and efficient reasoning models to support real-world applications has driven the development of knowledge distillation techniques that balance reasoning performance and inference speed. In this paper, we further extend the DistilQwen model family, initialized from the Qwen models, by introducing four model series specifically designed to meet industrial requirements. The distilled model collection comprises: (1) slow-thinking models, optimized for reasoning tasks that require high accuracy; (2) two series of adaptive-thinking models, which dynamically adjust reasoning strategies based on input tasks to maximize efficiency across diverse scenarios; and (3) distilled reward models, which enable further reinforcement learning of reasoning models using distilled knowledge. Comprehensive evaluations across multiple benchmarks demonstrate both high inference efficiency and strong reasoning performance for these models, as well as the practical utility of distilled reward models. We further show that these models support industry practitioners by providing scalable training and inference functionalities on the Alibaba Cloud PAI (Platform for Artificial Intelligence) platform.

Anthology ID:: 2025.emnlp-industry.94
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:: November
Year:: 2025
Address:: Suzhou (China)
Editors:: Saloni Potdar, Lina Rojas-Barahona, Sebastien Montella
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1357–1365
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-industry.94/
DOI:
Bibkey:
Cite (ACL):: Wenrui Cai, Chengyu Wang, Junbing Yan, Jun Huang, and Xiangzhong Fang. 2025. Thinking with DistilQwen: A Tale of Four Distilled Reasoning and Reward Model Series. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 1357–1365, Suzhou (China). Association for Computational Linguistics.
Cite (Informal):: Thinking with DistilQwen: A Tale of Four Distilled Reasoning and Reward Model Series (Cai et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-industry.94.pdf

PDF Cite Search Fix data