LLoCO: Learning Long Contexts Offline

Sijun Tan; Xiuyu Li; Shishir G Patil; Ziyang Wu; Tianjun Zhang; Kurt Keutzer; Joseph E. Gonzalez; Raluca Ada Popa

doi:10.18653/v1/2024.emnlp-main.975

LLoCO: Learning Long Contexts Offline

Sijun Tan, Xiuyu Li, Shishir G Patil, Ziyang Wu, Tianjun Zhang, Kurt Keutzer, Joseph E. Gonzalez, Raluca Ada Popa

Abstract

Processing long contexts remains a challenge for large language models (LLMs) due to the quadratic computational and memory overhead of the self-attention mechanism and the substantial KV cache sizes during generation. We propose LLoCO, a novel approach to address this problem by learning contexts offline through context compression and in-domain parameter-efficient finetuning with LoRA. Our method enables an LLM to create a concise representation of the original context and efficiently retrieve relevant information to answer questions accurately. Our approach extends the effective context window of a 4k token LLaMA2-7B model to handle up to 128k tokens. We evaluate our approach on several long-context question-answering datasets, demonstrating that LLoCO significantly outperforms in-context learning while using 30 × fewer tokens during inference. LLoCO achieves up to 7.62 × speed-up during inference and 11.52 × higher throughput during finetuning, substantially reduces the cost of long document question answering. This makes it a promising solution for efficient long context processing.

Anthology ID:: 2024.emnlp-main.975
Volume:: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 17605–17621
Language:
URL:: https://aclanthology.org/2024.emnlp-main.975
DOI:: 10.18653/v1/2024.emnlp-main.975
Bibkey:
Cite (ACL):: Sijun Tan, Xiuyu Li, Shishir G Patil, Ziyang Wu, Tianjun Zhang, Kurt Keutzer, Joseph E. Gonzalez, and Raluca Ada Popa. 2024. LLoCO: Learning Long Contexts Offline. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 17605–17621, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: LLoCO: Learning Long Contexts Offline (Tan et al., EMNLP 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/dois-2013-emnlp/2024.emnlp-main.975.pdf

PDF Search