Long-Context Language Modeling with Parallel Context Encoding

Howard Yen, Tianyu Gao, Danqi Chen


Abstract
Extending large language models (LLMs) to process longer inputs is crucial for numerous applications. However, the considerable computational cost of transformers, coupled with limited generalization of positional encoding, restricts the size of their context window. We introduce Cross-Attention to Parallel Encodings (CAPE), a framework that can be applied to any existing decoder-only LLMs for context expansion. CAPE leverages a small encoder to process a long input chunk by chunk and enables the frozen decoder to cross-attend to the additional contexts. CAPE is efficient, generalizable, and versatile: trained with 8K-token documents, CAPE extends the context window of LLaMA-2 to 128K tokens, offering 10× of the throughput with only 1/6 of the memory. CAPE yields strong performance on language modeling and in-context learning. CAPE also excels in retrieval-augmented applications, while existing long-context models degenerate with retrieved contexts. We further introduce a CAPE variant that can extend the context window of instruction-tuned models with only unlabeled data, and showcase its effectiveness on LLaMA-2-Chat, leading to a strong instruction-following model that can leverage very long context on downstream tasks.
Anthology ID:
2024.acl-long.142
Volume:
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2588–2610
Language:
URL:
https://aclanthology.org/2024.acl-long.142
DOI:
Bibkey:
Cite (ACL):
Howard Yen, Tianyu Gao, and Danqi Chen. 2024. Long-Context Language Modeling with Parallel Context Encoding. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2588–2610, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Long-Context Language Modeling with Parallel Context Encoding (Yen et al., ACL 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-bitext-workshop/2024.acl-long.142.pdf