HMT: Hierarchical Memory Transformer for Efficient Long Context Language Processing

Zifan He, Yingqi Cao, Zongyue Qin, Neha Prakriya, Yizhou Sun, Jason Cong


Abstract
Transformer-based large language models (LLM) have been widely used in language processing applications. However, due to the memory constraints of the devices, most of them restrict the context window. Even though recurrent models in previous works can memorize past tokens to enable unlimited context and maintain effectiveness, they have “flat” memory architectures. Such architectures have limitations in selecting and filtering information. Since humans are good at learning and self-adjustment, we believe that imitating brain memory hierarchy is beneficial for model memorization. Thus, we propose the Hierarchical Memory Transformer (HMT), a novel framework that facilitates a model’s long-context processing ability by imitating human memorization behavior. Leveraging memory-augmented segment-level recurrence, we organize the memory hierarchy by preserving tokens from early input segments, passing memory embeddings along the sequence, and recalling relevant information from history. Evaluating general language modeling, question-answering tasks, and the summarization task, we show that HMT consistently improves the long-context processing ability of existing models. Furthermore, HMT achieves a comparable or superior generation quality to long-context LLMs with 2 ∼ 57× fewer parameters and 2.5 ∼ 116× less inference memory, significantly outperforming previous memory-augmented models.
Anthology ID:
2025.naacl-long.410
Volume:
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8068–8089
Language:
URL:
https://preview.aclanthology.org/moar-dois/2025.naacl-long.410/
DOI:
10.18653/v1/2025.naacl-long.410
Bibkey:
Cite (ACL):
Zifan He, Yingqi Cao, Zongyue Qin, Neha Prakriya, Yizhou Sun, and Jason Cong. 2025. HMT: Hierarchical Memory Transformer for Efficient Long Context Language Processing. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 8068–8089, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
HMT: Hierarchical Memory Transformer for Efficient Long Context Language Processing (He et al., NAACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/moar-dois/2025.naacl-long.410.pdf