Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding
Taewon Yun, Jisu Shin, Jeonghwan Choi, Seunghwan Bang, Hwanjun Song
Abstract
Distilling large reasoning models (LRMs) has become essential for making their Long-CoT reasoning capabilities practical, as full-scale inference remains computationally prohibitive. Existing curation-based approaches, which select complete reasoning traces post-hoc, overlook the collaborative potential of heterogeneous teachers and fail to adapt exploration dynamically, often leading to redundant sampling and missed opportunities for complementary reasoning. To address this, we introduce CoRD, a collaborative multi-teacher decoding framework that performs step-wise reasoning synthesis guided by predictive perplexity–based scoring and beam search. This approach enables heterogeneous LRMs to jointly construct coherent reasoning trajectories while maintaining diverse, high-potential hypotheses efficiently. Experiments show that CoRD generates higher-quality reasoning data and achieves student performance approaching teacher-level results, demonstrating that fine-grained collaboration among diverse LRMs yields structured, efficient, and robust reasoning distillation. The dataset and model are available at https://github.com/DISL-Lab/CoRD- Anthology ID:
- 2026.findings-acl.1867
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 37452–37468
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1867/
- DOI:
- Cite (ACL):
- Taewon Yun, Jisu Shin, Jeonghwan Choi, Seunghwan Bang, and Hwanjun Song. 2026. Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding. In Findings of the Association for Computational Linguistics: ACL 2026, pages 37452–37468, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding (Yun et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1867.pdf