Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding

Taewon Yun, Jisu Shin, Jeonghwan Choi, Seunghwan Bang, Hwanjun Song


Abstract
Distilling large reasoning models (LRMs) has become essential for making their Long-CoT reasoning capabilities practical, as full-scale inference remains computationally prohibitive. Existing curation-based approaches, which select complete reasoning traces post-hoc, overlook the collaborative potential of heterogeneous teachers and fail to adapt exploration dynamically, often leading to redundant sampling and missed opportunities for complementary reasoning. To address this, we introduce CoRD, a collaborative multi-teacher decoding framework that performs step-wise reasoning synthesis guided by predictive perplexity–based scoring and beam search. This approach enables heterogeneous LRMs to jointly construct coherent reasoning trajectories while maintaining diverse, high-potential hypotheses efficiently. Experiments show that CoRD generates higher-quality reasoning data and achieves student performance approaching teacher-level results, demonstrating that fine-grained collaboration among diverse LRMs yields structured, efficient, and robust reasoning distillation. The dataset and model are available at https://github.com/DISL-Lab/CoRD
Anthology ID:
2026.findings-acl.1867
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
37452–37468
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1867/
DOI:
Bibkey:
Cite (ACL):
Taewon Yun, Jisu Shin, Jeonghwan Choi, Seunghwan Bang, and Hwanjun Song. 2026. Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding. In Findings of the Association for Computational Linguistics: ACL 2026, pages 37452–37468, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding (Yun et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1867.pdf
Checklist:
 2026.findings-acl.1867.checklist.pdf