STEP: Staged Parameter-Efficient Pre-training for Large Language Models

Kazuki Yano, Takumi Ito, Jun Suzuki


Abstract
Pre-training large language models (LLMs) faces significant memory challenges due to the large size of model weights. We introduce STaged parameter-Efficient Pre-training (STEP), which integrates parameter-efficient tuning techniques with model growth. We conduct experiments on pre-training LLMs of various sizes and demonstrate that STEP achieves up to a 53.9% reduction in maximum memory requirements compared to vanilla pre-training while maintaining equivalent performance. Furthermore, we show that the model by STEP performs comparably to vanilla pre-trained models on downstream tasks after instruction tuning.
Anthology ID:
2025.naacl-short.32
Volume:
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
374–384
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.naacl-short.32/
DOI:
Bibkey:
Cite (ACL):
Kazuki Yano, Takumi Ito, and Jun Suzuki. 2025. STEP: Staged Parameter-Efficient Pre-training for Large Language Models. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers), pages 374–384, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
STEP: Staged Parameter-Efficient Pre-training for Large Language Models (Yano et al., NAACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.naacl-short.32.pdf