Scaling Language Model Size in Cross-Device Federated Learning
Jae Ro, Theresa Breiner, Lara McConnaughey, Mingqing Chen, Ananda Suresh, Shankar Kumar, Rajiv Mathews
Abstract
Most studies in cross-device federated learning focus on small models, due to the server-client communication and on-device computation bottlenecks. In this work, we leverage various techniques for mitigating these bottlenecks to train larger language models in cross-device federated learning. With systematic applications of partial model training, quantization, efficient transfer learning, and communication-efficient optimizers, we are able to train a 21M parameter Transformer that achieves the same perplexity as that of a similarly sized LSTM with ∼10× smaller client-to-server communication cost and 11% lower perplexity than smaller LSTMs commonly studied in literature.- Anthology ID:
- 2022.fl4nlp-1.2
- Volume:
- Proceedings of the First Workshop on Federated Learning for Natural Language Processing (FL4NLP 2022)
- Month:
- May
- Year:
- 2022
- Address:
- Dublin, Ireland
- Venue:
- FL4NLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 6–20
- Language:
- URL:
- https://aclanthology.org/2022.fl4nlp-1.2
- DOI:
- 10.18653/v1/2022.fl4nlp-1.2
- Cite (ACL):
- Jae Ro, Theresa Breiner, Lara McConnaughey, Mingqing Chen, Ananda Suresh, Shankar Kumar, and Rajiv Mathews. 2022. Scaling Language Model Size in Cross-Device Federated Learning. In Proceedings of the First Workshop on Federated Learning for Natural Language Processing (FL4NLP 2022), pages 6–20, Dublin, Ireland. Association for Computational Linguistics.
- Cite (Informal):
- Scaling Language Model Size in Cross-Device Federated Learning (Ro et al., FL4NLP 2022)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2022.fl4nlp-1.2.pdf