Scaling Language Model Size in Cross-Device Federated Learning

Jae Ro, Theresa Breiner, Lara McConnaughey, Mingqing Chen, Ananda Suresh, Shankar Kumar, Rajiv Mathews


Abstract
Most studies in cross-device federated learning focus on small models, due to the server-client communication and on-device computation bottlenecks. In this work, we leverage various techniques for mitigating these bottlenecks to train larger language models in cross-device federated learning. With systematic applications of partial model training, quantization, efficient transfer learning, and communication-efficient optimizers, we are able to train a 21M parameter Transformer that achieves the same perplexity as that of a similarly sized LSTM with ∼10× smaller client-to-server communication cost and 11% lower perplexity than smaller LSTMs commonly studied in literature.
Anthology ID:
2022.fl4nlp-1.2
Volume:
Proceedings of the First Workshop on Federated Learning for Natural Language Processing (FL4NLP 2022)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Venue:
FL4NLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6–20
Language:
URL:
https://aclanthology.org/2022.fl4nlp-1.2
DOI:
10.18653/v1/2022.fl4nlp-1.2
Bibkey:
Cite (ACL):
Jae Ro, Theresa Breiner, Lara McConnaughey, Mingqing Chen, Ananda Suresh, Shankar Kumar, and Rajiv Mathews. 2022. Scaling Language Model Size in Cross-Device Federated Learning. In Proceedings of the First Workshop on Federated Learning for Natural Language Processing (FL4NLP 2022), pages 6–20, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Scaling Language Model Size in Cross-Device Federated Learning (Ro et al., FL4NLP 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.fl4nlp-1.2.pdf