Incremental pre-training from smaller language models

Han Zhang; Hui Wang; Ruifeng Xu (徐睿峰)

Incremental pre-training from smaller language models

Abstract

Large language models have recently become a new learning paradigm and led to state-of-the-art performance across a range of tasks. As explosive open-source pre-trained models are available, it is worth investigating how to better utilize existing models. We propose a simple yet effective method, Incr-Pretrain, for incrementally pre-training language models from smaller well-trained source models. Different layer-wise transfer strategies were introduced for model augmentation including parameter copying, initial value padding, and model distillation. Experiments on multiple zero-shot learning tasks demonstrate satisfying inference performance upon transferring and promising training efficiency during continuing pre-training. Compared to training from scratch, Incr-Pretrain can save up to half the training time to get a similar testing loss.

Anthology ID:: 2024.sighan-1.5
Volume:: Proceedings of the 10th SIGHAN Workshop on Chinese Language Processing (SIGHAN-10)
Month:: August
Year:: 2024
Address:: Bangkok, Thailand
Editors:: Kam-Fai Wong, Min Zhang, Ruifeng Xu, Jing Li, Zhongyu Wei, Lin Gui, Bin Liang, Runcong Zhao
Venues:: SIGHAN | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 36–44
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.sighan-1.5/
DOI:
Bibkey:
Cite (ACL):: Han Zhang, Hui Wang, and Ruifeng Xu. 2024. Incremental pre-training from smaller language models. In Proceedings of the 10th SIGHAN Workshop on Chinese Language Processing (SIGHAN-10), pages 36–44, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):: Incremental pre-training from smaller language models (Zhang et al., SIGHAN 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.sighan-1.5.pdf

PDF Cite Search Fix data