LESA: Learnable LLM Layer Scaling-Up

Yifei Yang; Zouying Cao; Xinbei Ma; Yao Yao; Zhi Chen; Libo Qin; Hai Zhao

LESA: Learnable LLM Layer Scaling-Up

Yifei Yang, Zouying Cao, Xinbei Ma, Yao Yao, Zhi Chen, Libo Qin, Hai Zhao

Abstract

Training Large Language Models (LLMs) from scratch requires immense computational resources, making it prohibitively expensive. Model scaling-up offers a promising solution by leveraging the parameters of smaller models to create larger ones. However, existing depth scaling-up methods rely on empirical heuristic rules for layer duplication, which result in poorer initialization and slower convergence during continual pre-training. We propose LESA, a novel learnable method for depth scaling-up. By concatenating parameters from each layer and applying Singular Value Decomposition, we uncover latent patterns between layers, suggesting that inter-layer parameters can be learned. LESA uses a neural network to predict the parameters inserted between adjacent layers, enabling better initialization and faster training. Experiments show that LESA outperforms existing baselines, achieving superior performance with less than half the computational cost during continual pre-training. Extensive analyses demonstrate its effectiveness across different model sizes and tasks.

Anthology ID:: 2025.acl-long.1095
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 22463–22476
Language:
URL:: https://preview.aclanthology.org/landing_page/2025.acl-long.1095/
DOI:
Bibkey:
Cite (ACL):: Yifei Yang, Zouying Cao, Xinbei Ma, Yao Yao, Zhi Chen, Libo Qin, and Hai Zhao. 2025. LESA: Learnable LLM Layer Scaling-Up. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 22463–22476, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: LESA: Learnable LLM Layer Scaling-Up (Yang et al., ACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2025.acl-long.1095.pdf

PDF Cite Search Fix data