LongHeads: Multi-Head Attention is Secretly a Long Context Processor

Yi Lu, Xin Zhou, Wei He, Jun Zhao, Tao Ji, Tao Gui, Qi Zhang, Xuanjing Huang


Abstract
Large language models (LLMs) have achieved impressive performance in numerous domains but often struggle to process lengthy inputs effectively and efficiently due to limited length generalization and attention’s quadratic computational demands. Many sought to mitigate this by restricting the attention window within the pre-trained length. However, these methods introduce new issues such as ignoring the middle context and requiring additional training. To address these problems, we propose LongHeads, a training-free framework that enhances LLM’s long context ability by unlocking multi-head attention’s untapped potential. Instead of allowing each head to attend to the full sentence, which struggles with generalizing to longer sequences due to out-of-distribution (OOD) issues, we allow each head to process in-distribution length by selecting and attending to important context chunks. To this end, we propose a chunk selection strategy that relies on the inherent correlation between the query and the key representations, efficiently distributing context chunks to different heads. In this way, each head ensures it can effectively process attended tokens within the trained length, while different heads in different layers can collectively process longer contexts. LongHeads works efficiently and fits seamlessly with many LLMs that use relative positional encoding. LongHeads achieves 100% accuracy at the 128k length on passkey retrieval task, verifying LongHeads’ efficacy in extending the usable context window for existing models.
Anthology ID:
2024.findings-emnlp.417
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7136–7148
Language:
URL:
https://preview.aclanthology.org/build-pipeline-with-new-library/2024.findings-emnlp.417/
DOI:
10.18653/v1/2024.findings-emnlp.417
Bibkey:
Cite (ACL):
Yi Lu, Xin Zhou, Wei He, Jun Zhao, Tao Ji, Tao Gui, Qi Zhang, and Xuanjing Huang. 2024. LongHeads: Multi-Head Attention is Secretly a Long Context Processor. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 7136–7148, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
LongHeads: Multi-Head Attention is Secretly a Long Context Processor (Lu et al., Findings 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/build-pipeline-with-new-library/2024.findings-emnlp.417.pdf
Software:
 2024.findings-emnlp.417.software.zip