Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training

Sun Ao, Weilin Zhao, Xu Han, Cheng Yang, Xinrong Zhang, Zhiyuan Liu, Chuan Shi, Maosong Sun


Abstract
Training large language models (LLMs) heavily relies on distributed training strategies, among which pipeline parallelism (PP) plays a crucial role. As training sequences extend to 32k or even 128k tokens, current PP methods face severe bottlenecks, including substantial pipeline bubbles and high memory footprint, greatly hindering training throughput and model scalability. This paper introduces a sequence-level one-forward-one-backward (1F1B) PP method, named Seq1F1B, tailored for training LLMs on long sequences with high training throughput and memory efficiency. Unlike typical PP methods, which adopt batch-level pipeline schedule, Seq1F1B schedules the pipeline of training LLMs at the sequence level. It uses a computational strategy to partition sequences appropriately, significantly reducing pipeline bubbles and memory footprint. Compared to competitive PP baselines such as Megatron 1F1B PP, Seq1F1B achieves 1.14X training throughput with half memory footprint.Notably, Seq1F1B trains an LLM with 30B parameters on sequences up to 64k tokens using 64X NVIDIA A100 GPUs without using recomputation strategies, a feat unachievable with existing methods.We have released our code on GitHub to facilitate further research and development in LLM training on long sequences: https://github.com/thunlp/Seq1F1B.
Anthology ID:
2025.naacl-long.454
Volume:
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8998–9008
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.naacl-long.454/
DOI:
Bibkey:
Cite (ACL):
Sun Ao, Weilin Zhao, Xu Han, Cheng Yang, Xinrong Zhang, Zhiyuan Liu, Chuan Shi, and Maosong Sun. 2025. Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 8998–9008, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training (Ao et al., NAACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.naacl-long.454.pdf