ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning

Shachar Don-Yehiya, Elad Venezian, Colin Raffel, Noam Slonim, Leshem Choshen


Abstract
Pretraining has been shown to scale well with compute, data size and data diversity. Multitask learning trains on a mixture of supervised datasets and produces improved performance compared to self-supervised pretraining. Until now, massively multitask learning required simultaneous access to all datasets in the mixture and heavy compute resources that are only available to well-resourced teams. In this paper, we propose ColD Fusion, a method that provides the benefits of multitask learning but leverages distributed computation and requires limited communication and no sharing of data. Consequentially, ColD Fusion can create a synergistic loop, where finetuned models can be recycled to continually improve the pretrained model they are based on. We show that ColD Fusion yields comparable benefits to multitask training by producing a model that (a) attains strong performance on all of the datasets it was multitask trained on and (b) is a better starting point for finetuning on unseen datasets. We find ColD Fusion outperforms RoBERTa and even previous multitask models. Specifically, when training and testing on 35 diverse datasets, ColD Fusion-based model outperforms RoBERTa by 2.19 points on average without any changes to the architecture.
Anthology ID:
2023.acl-long.46
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
788–806
Language:
URL:
https://aclanthology.org/2023.acl-long.46
DOI:
10.18653/v1/2023.acl-long.46
Bibkey:
Cite (ACL):
Shachar Don-Yehiya, Elad Venezian, Colin Raffel, Noam Slonim, and Leshem Choshen. 2023. ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 788–806, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning (Don-Yehiya et al., ACL 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-5/2023.acl-long.46.pdf
Video:
 https://preview.aclanthology.org/nschneid-patch-5/2023.acl-long.46.mp4