Recipes for Sequential Pre-training of Multilingual Encoder and Seq2Seq Models
Saleh Soltan, Andy Rosenbaum, Tobias Falke, Qin Lu, Anna Rumshisky, Wael Hamza
Abstract
Pre-trained encoder-only and sequence-to-sequence (seq2seq) models each have advantages, however training both model types from scratch is computationally expensive. We explore recipes to improve pre-training efficiency by initializing one model from the other. (1) Extracting the encoder from a seq2seq model, we show it under-performs a Masked Language Modeling (MLM) encoder, particularly on sequence labeling tasks. Variations of masking during seq2seq training, reducing the decoder size, and continuing with a small amount of MLM training do not close the gap. (2) Conversely, using an encoder to warm-start seq2seq training, we show that by unfreezing the encoder partway through training, we can match task performance of a from-scratch seq2seq model. Overall, this two-stage approach is an efficient recipe to obtain both a multilingual encoder and a seq2seq model, matching the performance of training each model from scratch while reducing the total compute cost by 27%.- Anthology ID:
- 2023.findings-acl.598
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2023
- Month:
- July
- Year:
- 2023
- Address:
- Toronto, Canada
- Editors:
- Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 9380–9394
- Language:
- URL:
- https://aclanthology.org/2023.findings-acl.598
- DOI:
- 10.18653/v1/2023.findings-acl.598
- Cite (ACL):
- Saleh Soltan, Andy Rosenbaum, Tobias Falke, Qin Lu, Anna Rumshisky, and Wael Hamza. 2023. Recipes for Sequential Pre-training of Multilingual Encoder and Seq2Seq Models. In Findings of the Association for Computational Linguistics: ACL 2023, pages 9380–9394, Toronto, Canada. Association for Computational Linguistics.
- Cite (Informal):
- Recipes for Sequential Pre-training of Multilingual Encoder and Seq2Seq Models (Soltan et al., Findings 2023)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2023.findings-acl.598.pdf