Zheng Xu


2024

pdf
Can Public Large Language Models Help Private Cross-device Federated Learning?
Boxin Wang | Yibo Zhang | Yuan Cao | Bo Li | Hugh McMahan | Sewoong Oh | Zheng Xu | Manzil Zaheer
Findings of the Association for Computational Linguistics: NAACL 2024

We study (differentially) private federated learning (FL) of language models. The language models in cross-device FL are relatively small, which can be trained with meaningful formal user-level differential privacy (DP) guarantees when massive parallelism in training is enabled by the participation of a moderate size of users. Recently, public data has been used to improve privacy-utility trade-offs for both large and small language models. In this work, we provide a systematic study of using large-scale public data and LLMs to help differentially private training of on-device FL models, and further improve the privacy-utility tradeoff by techniques of distillation. Moreover, we propose a novel distribution matching algorithm with theoretical grounding to sample public data close to private data distribution, which significantly improves the sample efficiency of (pre-)training on public data. The proposed method is efficient and effective for training private models by taking advantage of public data, especially for customized on-device architectures that do not have ready-touse pre-trained models.

2023

pdf
Federated Learning of Gboard Language Models with Differential Privacy
Zheng Xu | Yanxiang Zhang | Galen Andrew | Christopher Choquette | Peter Kairouz | Brendan Mcmahan | Jesse Rosenstock | Yuanbo Zhang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track)

We train and deploy language models (LMs) with federated learning (FL) and differential privacy (DP) in Google Keyboard (Gboard). The recent DP-Follow the Regularized Leader (DP-FTRL) algorithm is applied to achieve meaningfully formal DP guarantees without requiring uniform sampling of clients. To provide favorable privacy-utility trade-offs, we introduce a new client participation criterion and discuss the implication of its configuration in large scale systems. We show how quantile-based clip estimation can be combined with DP-FTRL to adaptively choose the clip norm during training or reduce the hyperparameter tuning in preparation of training. With the help of pretraining on public data, we trained and deployed more than fifteen Gboard LMs that achieve high utility and $\rho-$zCDP privacy guarantees with $\rho \in (0.3, 2)$, with one model additionally trained with secure aggregation. We summarize our experience and provide concrete suggestions on DP training for practitioners.