JiakaiWang JiakaiWang


2024

pdf
E2-LLM: Efficient and Extreme Length Extension of Large Language Models
Jiaheng Liu | ZhiqiBai ZhiqiBai | Yuanxing Zhang | Chenchen Zhang | YuangZh YuangZh | Ge Zhang | JiakaiWang JiakaiWang | Haoran Que | Yukang Chen | Wenbo Su | Tiezheng Ge | Jie Fu | Wenhu Chen | Bo Zheng
Findings of the Association for Computational Linguistics: ACL 2024

Training Large Language Models (LLMs) to process extensive context lengths incurs prohibitive computational costs. Prevailing techniques for extending context capabilities in LLMs typically require not only additional training procedures but also access to datasets with long context (e.g., sequences of 32K tokens), presupposing substantial GPU expenditures. To address the aforementioned issues, we introduce a novel solution named Efficient and Extreme length extension for Large Language Models (E2-LLM). E2-LLM entails a singular training process over considerably short sequences (e.g., 4K tokens), which greatly mitigates the cost of continual-pretraining or fine-tuning. Within the training phase, we incorporate a dual augmentation strategy with Rotary Position Embeddings (RoPE) that adjusts the scale and position indices across distinct training samples. E 2 -LLM is meticulously designed to enhance the model’s robustness to diverse relative positions. The experimental results on multiple benchmark datasets demonstrate the superior performance of E 2 -LLM on demanding tasks of processing long contexts.