STEP: Staged Parameter-Efficient Pre-training for Large Language Models

Kazuki Yano; Takumi Ito; Jun Suzuki

STEP: Staged Parameter-Efficient Pre-training for Large Language Models

Abstract

Pre-training large language models (LLMs) faces significant memory challenges due to the large size of model weights. We introduce STaged parameter-Efficient Pre-training (STEP), which integrates parameter-efficient tuning techniques with model growth. We conduct experiments on pre-training LLMs of various sizes and demonstrate that STEP achieves up to a 53.9% reduction in maximum memory requirements compared to vanilla pre-training while maintaining equivalent performance. Furthermore, we show that the model by STEP performs comparably to vanilla pre-trained models on downstream tasks after instruction tuning.

Anthology ID:: 2025.naacl-short.32
Volume:: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 374–384
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2025.naacl-short.32/
DOI:
Bibkey:
Cite (ACL):: Kazuki Yano, Takumi Ito, and Jun Suzuki. 2025. STEP: Staged Parameter-Efficient Pre-training for Large Language Models. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers), pages 374–384, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: STEP: Staged Parameter-Efficient Pre-training for Large Language Models (Yano et al., NAACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2025.naacl-short.32.pdf

PDF Cite Search Fix data