ELLE: Efficient Lifelong Pre-training for Emerging Data
Yujia Qin, Jiajie Zhang, Yankai Lin, Zhiyuan Liu, Peng Li, Maosong Sun, Jie Zhou
Abstract
Current pre-trained language models (PLM) are typically trained with static data, ignoring that in real-world scenarios, streaming data of various sources may continuously grow. This requires PLMs to integrate the information from all the sources in a lifelong manner. Although this goal could be achieved by exhaustive pre-training on all the existing data, such a process is known to be computationally expensive. To this end, we propose ELLE, aiming at efficient lifelong pre-training for emerging data. Specifically, ELLE consists of (1) function preserved model expansion, which flexibly expands an existing PLM’s width and depth to improve the efficiency of knowledge acquisition; and (2) pre-trained domain prompts, which disentangle the versatile knowledge learned during pre-training and stimulate the proper knowledge for downstream tasks. We experiment ELLE with streaming data from 5 domains on BERT and GPT. The results show the superiority of ELLE over various lifelong learning baselines in both pre-training efficiency and downstream performances. The codes are publicly available at https://github.com/thunlp/ELLE.- Anthology ID:
- 2022.findings-acl.220
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2022
- Month:
- May
- Year:
- 2022
- Address:
- Dublin, Ireland
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2789–2810
- Language:
- URL:
- https://aclanthology.org/2022.findings-acl.220
- DOI:
- 10.18653/v1/2022.findings-acl.220
- Cite (ACL):
- Yujia Qin, Jiajie Zhang, Yankai Lin, Zhiyuan Liu, Peng Li, Maosong Sun, and Jie Zhou. 2022. ELLE: Efficient Lifelong Pre-training for Emerging Data. In Findings of the Association for Computational Linguistics: ACL 2022, pages 2789–2810, Dublin, Ireland. Association for Computational Linguistics.
- Cite (Informal):
- ELLE: Efficient Lifelong Pre-training for Emerging Data (Qin et al., Findings 2022)
- PDF:
- https://preview.aclanthology.org/starsem-semeval-split/2022.findings-acl.220.pdf
- Code
- thunlp/elle