EdgeInfinite: A Memory-Efficient Infinite-Context Transformer for Edge Devices
Jiyu Chen, Shuang Peng, Daxiong Luo, Fan Yang, Renshou Wu, Fangyuan Li, Xiaoxin Chen
Abstract
Transformer-based large language models (LLMs) encounter challenges in processing long sequences on edge devices due to the quadratic complexity of attention mechanisms and growing memory demands from Key-Value (KV) cache. Existing KV cache optimizations struggle with irreversible token eviction in long-output tasks, while alternative sequence modeling architectures prove costly to adopt within established Transformer infrastructure. We present EdgeInfinite, a memory-efficient solution for infinite contexts that integrates compressed memory into Transformer-based LLMs through a trainable memory-gating module. This approach maintains full compatibility with standard Transformer architectures, requiring fine-tuning only a small part of parameters, and enables selective activation of the memory-gating module for long and short context task routing. The experimental result shows that EdgeInfinite achieves comparable performance to baseline Transformer-based LLM on long context benchmarks while optimizing memory consumption and time to first token.- Anthology ID:
- 2025.acl-industry.40
- Volume:
- Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)
- Month:
- July
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Georg Rehm, Yunyao Li
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 568–575
- Language:
- URL:
- https://preview.aclanthology.org/corrections-2025-08/2025.acl-industry.40/
- DOI:
- 10.18653/v1/2025.acl-industry.40
- Cite (ACL):
- Jiyu Chen, Shuang Peng, Daxiong Luo, Fan Yang, Renshou Wu, Fangyuan Li, and Xiaoxin Chen. 2025. EdgeInfinite: A Memory-Efficient Infinite-Context Transformer for Edge Devices. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track), pages 568–575, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- EdgeInfinite: A Memory-Efficient Infinite-Context Transformer for Edge Devices (Chen et al., ACL 2025)
- PDF:
- https://preview.aclanthology.org/corrections-2025-08/2025.acl-industry.40.pdf