Incremental pre-training from smaller language models

Han Zhang, Hui Wang, Ruifeng Xu


Abstract
Large language models have recently become a new learning paradigm and led to state-of-the-art performance across a range of tasks. As explosive open-source pre-trained models are available, it is worth investigating how to better utilize existing models. We propose a simple yet effective method, Incr-Pretrain, for incrementally pre-training language models from smaller well-trained source models. Different layer-wise transfer strategies were introduced for model augmentation including parameter copying, initial value padding, and model distillation. Experiments on multiple zero-shot learning tasks demonstrate satisfying inference performance upon transferring and promising training efficiency during continuing pre-training. Compared to training from scratch, Incr-Pretrain can save up to half the training time to get a similar testing loss.
Anthology ID:
2024.sighan-1.5
Volume:
Proceedings of the 10th SIGHAN Workshop on Chinese Language Processing (SIGHAN-10)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Kam-Fai Wong, Min Zhang, Ruifeng Xu, Jing Li, Zhongyu Wei, Lin Gui, Bin Liang, Runcong Zhao
Venues:
SIGHAN | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
36–44
Language:
URL:
https://aclanthology.org/2024.sighan-1.5
DOI:
Bibkey:
Cite (ACL):
Han Zhang, Hui Wang, and Ruifeng Xu. 2024. Incremental pre-training from smaller language models. In Proceedings of the 10th SIGHAN Workshop on Chinese Language Processing (SIGHAN-10), pages 36–44, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Incremental pre-training from smaller language models (Zhang et al., SIGHAN-WS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2024.sighan-1.5.pdf