LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models

Zhiyuan Hu; Yuliang Liu; Jinman Zhao; Suyuchen Wang; WangYan WangYan; Wei Shen; Qing Gu; Luu Anh Tuan; See Kiong Ng; Zhiwei Jiang; Bryan Hooi

LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models

Zhiyuan Hu, Yuliang Liu, Jinman Zhao, Suyuchen Wang, WangYan WangYan, Wei Shen, Qing Gu, Anh Tuan Luu, See-Kiong Ng, Zhiwei Jiang, Bryan Hooi

Abstract

Large language models (LLMs) face significant challenges in handling long-context tasks because of their limited effective context window size during pretraining, which restricts their ability to generalize over extended sequences. Meanwhile, extending the context window in LLMs through post-pretraining is highly resource-intensive.To address this, we introduce LongRecipe, an efficient training strategy for extending the context window of LLMs, including impactful token analysis, position index transformation, and training optimization strategies. It simulates long-sequence inputs while maintaining training efficiency and significantly improves the model’s understanding of long-range dependencies. Experiments on three types of LLMs show that LongRecipe can utilize long sequences while requiring only 30% of the target context window size, and reduces computational training resource over 85% compared to full sequence training. Furthermore, LongRecipe also preserves the original LLM’s capabilities in general tasks. Ultimately, we can extend effective context window of open-source LLMs from 8k to 128k, achieving performance close to GPT-4 with just one day of dedicated training using a single GPU with 80G memory.Our code is released at https://github.com/zhiyuanhubj/LongRecipe.

Anthology ID:: 2025.acl-long.581
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 11857–11870
Language:
URL:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.581/
DOI:
Bibkey:
Cite (ACL):: Zhiyuan Hu, Yuliang Liu, Jinman Zhao, Suyuchen Wang, WangYan WangYan, Wei Shen, Qing Gu, Anh Tuan Luu, See-Kiong Ng, Zhiwei Jiang, and Bryan Hooi. 2025. LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11857–11870, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models (Hu et al., ACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.581.pdf

PDF Cite Search Fix data