Tianle Chen
2026
Adaptive Spatial and Temporal Redundancy Optimization for Efficient Reasoning in Large Language Models
Tianle Chen | Pengyu Cheng | Qiyuan Zhu | Jiacheng Wang | Bei Liu | Hao Gu | Ruijie Shen | Xiaofeng Hou | Sirui Han | Jiacheng Liu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Tianle Chen | Pengyu Cheng | Qiyuan Zhu | Jiacheng Wang | Bei Liu | Hao Gu | Ruijie Shen | Xiaofeng Hou | Sirui Han | Jiacheng Liu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Language Models (LLMs) have achieved exceptional performance in complex reasoning via Chain-of-Thought (CoT), yet the associated computational costs remain prohibitive. CoT reasoning contains significant untapped efficiency potential across two dimensions: temporal redundancy, where reasoning steps may be unnecessary, and spatial redundancy, where computations can be performed at reduced precision. While current optimization techniques often necessitate resource-intensive fine-tuning or data curation, we introduce ASTRO (Adaptive Spatial and Temporal Redundancy Optimization), a training-free framework that simultaneously addresses both dimensions. ASTRO leverages Dewey’s reflective thinking model to segment reasoning phases, applying a progressive precision reduction strategy coupled with an entropy-based confidence mechanism for adaptive termination. Empirical results across diverse reasoning benchmarks demonstrate that ASTRO achieves up to an 11.3 × efficiency gain without compromising accuracy, highlighting the advantages of holistic multi-dimensional redundancy management over isolated optimization methods.