Recipes for Sequential Pre-training of Multilingual Encoder and Seq2Seq Models

Saleh Soltan, Andy Rosenbaum, Tobias Falke, Qin Lu, Anna Rumshisky, Wael Hamza


Abstract
Pre-trained encoder-only and sequence-to-sequence (seq2seq) models each have advantages, however training both model types from scratch is computationally expensive. We explore recipes to improve pre-training efficiency by initializing one model from the other. (1) Extracting the encoder from a seq2seq model, we show it under-performs a Masked Language Modeling (MLM) encoder, particularly on sequence labeling tasks. Variations of masking during seq2seq training, reducing the decoder size, and continuing with a small amount of MLM training do not close the gap. (2) Conversely, using an encoder to warm-start seq2seq training, we show that by unfreezing the encoder partway through training, we can match task performance of a from-scratch seq2seq model. Overall, this two-stage approach is an efficient recipe to obtain both a multilingual encoder and a seq2seq model, matching the performance of training each model from scratch while reducing the total compute cost by 27%.
Anthology ID:
2023.findings-acl.598
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9380–9394
Language:
URL:
https://aclanthology.org/2023.findings-acl.598
DOI:
10.18653/v1/2023.findings-acl.598
Bibkey:
Cite (ACL):
Saleh Soltan, Andy Rosenbaum, Tobias Falke, Qin Lu, Anna Rumshisky, and Wael Hamza. 2023. Recipes for Sequential Pre-training of Multilingual Encoder and Seq2Seq Models. In Findings of the Association for Computational Linguistics: ACL 2023, pages 9380–9394, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Recipes for Sequential Pre-training of Multilingual Encoder and Seq2Seq Models (Soltan et al., Findings 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/dois-2013-emnlp/2023.findings-acl.598.pdf
Video:
 https://preview.aclanthology.org/dois-2013-emnlp/2023.findings-acl.598.mp4