From Long to Lean: Performance-aware and Adaptive Chain-of-Thought Compression via Multi-round Refinement

JianZhi Yan, Le Liu, Youcheng Pan, Shiwei Chen, Zike Yuan, Yang Xiang, Buzhou Tang


Abstract
Chain-of-Thought (CoT) reasoning improves performance on complex tasks but introduces significant inference latency due to its verbosity. In this work, we propose Multiround Adaptive Chain-of-Thought Compression (MACC), a framework that leverages the token elasticity phenomenon—where overly small token budgets may paradoxically increase output length—to progressively compress CoTs via multiround refinement. This adaptive strategy allows MACC to dynamically determine the optimal compression depth for each input. Our method achieves an average accuracy improvement of 5.6% over state-of-the-art baselines, while also reducing CoT length by an average of 47 tokens and significantly lowering latency. Furthermore, we show that test-time performance—accuracy and token length—can be reliably predicted using interpretable features like perplexity and compression rate on training set. Evaluated across different models, our method enables efficient model selection and forecasting without repeated fine-tuning, demonstrating that CoT compression is both effective and predictable. Our code will be released in https://github.com/Leon221220/MACC.
Anthology ID:
2025.emnlp-main.618
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12290–12306
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.618/
DOI:
Bibkey:
Cite (ACL):
JianZhi Yan, Le Liu, Youcheng Pan, Shiwei Chen, Zike Yuan, Yang Xiang, and Buzhou Tang. 2025. From Long to Lean: Performance-aware and Adaptive Chain-of-Thought Compression via Multi-round Refinement. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 12290–12306, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
From Long to Lean: Performance-aware and Adaptive Chain-of-Thought Compression via Multi-round Refinement (Yan et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.618.pdf
Checklist:
 2025.emnlp-main.618.checklist.pdf