Abstract
Large language models have recently become a new learning paradigm and led to state-of-the-art performance across a range of tasks. As explosive open-source pre-trained models are available, it is worth investigating how to better utilize existing models. We propose a simple yet effective method, Incr-Pretrain, for incrementally pre-training language models from smaller well-trained source models. Different layer-wise transfer strategies were introduced for model augmentation including parameter copying, initial value padding, and model distillation. Experiments on multiple zero-shot learning tasks demonstrate satisfying inference performance upon transferring and promising training efficiency during continuing pre-training. Compared to training from scratch, Incr-Pretrain can save up to half the training time to get a similar testing loss.- Anthology ID:
- 2024.sighan-1.5
- Volume:
- Proceedings of the 10th SIGHAN Workshop on Chinese Language Processing (SIGHAN-10)
- Month:
- August
- Year:
- 2024
- Address:
- Bangkok, Thailand
- Editors:
- Kam-Fai Wong, Min Zhang, Ruifeng Xu, Jing Li, Zhongyu Wei, Lin Gui, Bin Liang, Runcong Zhao
- Venues:
- SIGHAN | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 36–44
- Language:
- URL:
- https://aclanthology.org/2024.sighan-1.5
- DOI:
- Cite (ACL):
- Han Zhang, Hui Wang, and Ruifeng Xu. 2024. Incremental pre-training from smaller language models. In Proceedings of the 10th SIGHAN Workshop on Chinese Language Processing (SIGHAN-10), pages 36–44, Bangkok, Thailand. Association for Computational Linguistics.
- Cite (Informal):
- Incremental pre-training from smaller language models (Zhang et al., SIGHAN-WS 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2024.sighan-1.5.pdf