ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning
Shachar Don-Yehiya, Elad Venezian, Colin Raffel, Noam Slonim, Leshem Choshen
Abstract
Pretraining has been shown to scale well with compute, data size and data diversity. Multitask learning trains on a mixture of supervised datasets and produces improved performance compared to self-supervised pretraining. Until now, massively multitask learning required simultaneous access to all datasets in the mixture and heavy compute resources that are only available to well-resourced teams. In this paper, we propose ColD Fusion, a method that provides the benefits of multitask learning but leverages distributed computation and requires limited communication and no sharing of data. Consequentially, ColD Fusion can create a synergistic loop, where finetuned models can be recycled to continually improve the pretrained model they are based on. We show that ColD Fusion yields comparable benefits to multitask training by producing a model that (a) attains strong performance on all of the datasets it was multitask trained on and (b) is a better starting point for finetuning on unseen datasets. We find ColD Fusion outperforms RoBERTa and even previous multitask models. Specifically, when training and testing on 35 diverse datasets, ColD Fusion-based model outperforms RoBERTa by 2.19 points on average without any changes to the architecture.- Anthology ID:
- 2023.acl-long.46
- Volume:
- Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- July
- Year:
- 2023
- Address:
- Toronto, Canada
- Editors:
- Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 788–806
- Language:
- URL:
- https://aclanthology.org/2023.acl-long.46
- DOI:
- 10.18653/v1/2023.acl-long.46
- Cite (ACL):
- Shachar Don-Yehiya, Elad Venezian, Colin Raffel, Noam Slonim, and Leshem Choshen. 2023. ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 788–806, Toronto, Canada. Association for Computational Linguistics.
- Cite (Informal):
- ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning (Don-Yehiya et al., ACL 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-5/2023.acl-long.46.pdf