CRISP: Compressing Redundancy in Chain-of-Thought via Intrinsic Saliency Pruning

Yangsong Lan, Hongliang Dai, Piji Li


Abstract
Long Chain-of-Thought (CoT) reasoning is pivotal for the success of recent reasoning models but suffers from high computational overhead and latency. While prior works attempt to compress CoT via external compressor, they often fail to align with the model’s internal reasoning dynamics, resulting in the loss of critical logical steps. This paper presents Compressing Redundancy in Chain-of-Thought via Intrinsic Saliency Pruning (CRISP), a framework that compresses CoT by exploiting the model’s intrinsic saliency. Our analysis reveals a distinct phenomenon: the reasoning termination token acts as an information anchor, where its attention pattern effectively demarcates essential reasoning from redundancy. Based on this finding, we design a policy that utilizes these intrinsic attention signals to guide atomic compression operations. In contrast to coarse-grained pruning strategies, CRISP strategically distills the reasoning chain to maximize information density while preserving logical coherence. Empirical results across various backbone models and mathematical datasets demonstrate that CRISP achieves a 50-60% reduction in token count without compromising accuracy, effectively mitigating the efficiency bottleneck of long-context reasoning. We open-source our implementation to facilitate further research in efficient reasoning.
Anthology ID:
2026.findings-acl.1961
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
39355–39373
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1961/
DOI:
Bibkey:
Cite (ACL):
Yangsong Lan, Hongliang Dai, and Piji Li. 2026. CRISP: Compressing Redundancy in Chain-of-Thought via Intrinsic Saliency Pruning. In Findings of the Association for Computational Linguistics: ACL 2026, pages 39355–39373, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
CRISP: Compressing Redundancy in Chain-of-Thought via Intrinsic Saliency Pruning (Lan et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1961.pdf
Checklist:
 2026.findings-acl.1961.checklist.pdf