Smarter, Not Harder: Training-Free Adaptive Computation for Transformers

Romain Storaï, Jaeseong Lee, Seung-won Hwang


Abstract
Adaptive Computation in Transformers (ACT) has been pursued in two directions: efficiency- and performance-focused. We study performance-focused ACT, or PACT, which invests more computation on hard steps to improve performance, such as by adding forward passes. We first discuss beam search and hesitation-based methods as PACT and their limitations. While the hesitation-based approach outperforms beam search by perturbing input embeddings, it suffers from inefficiency due to invalidating KVCache and exhibits instability due to its reliance on randomness. To address this, we propose IMPACT, a novel PACT method that perturbs network weights rather than input embeddings. This approach enables the reuse of KVCache, offers deterministic predictions, and significantly improves memory and computational efficiency. By achieving a better balance between performance and efficiency, IMPACT makes PACT accessible to communities with consumer-grade hardware.
Anthology ID:
2025.findings-acl.426
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8147–8155
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.findings-acl.426/
DOI:
Bibkey:
Cite (ACL):
Romain Storaï, Jaeseong Lee, and Seung-won Hwang. 2025. Smarter, Not Harder: Training-Free Adaptive Computation for Transformers. In Findings of the Association for Computational Linguistics: ACL 2025, pages 8147–8155, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Smarter, Not Harder: Training-Free Adaptive Computation for Transformers (Storaï et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.findings-acl.426.pdf