Just Fine-tune Twice: Selective Differential Privacy for Large Language Models

Weiyan Shi, Ryan Shea, Si Chen, Chiyuan Zhang, Ruoxi Jia, Zhou Yu


Abstract
Protecting large language models from privacy leakage is becoming increasingly crucial with their wide adoption in real-world products. Yet applying *differential privacy* (DP), a canonical notion with provable privacy guarantees for machine learning models, to those models remains challenging due to the trade-off between model utility and privacy loss. Utilizing the fact that sensitive information in language data tends to be sparse, Shi et al. (2021) formalized a DP notion extension called *Selective Differential Privacy* (SDP) to protect only the sensitive tokens defined by a policy function. However, their algorithm only works for RNN-based models. In this paper, we develop a novel framework, *Just Fine-tune Twice* (JFT), that achieves SDP for state-of-the-art large transformer-based models. Our method is easy to implement: it first fine-tunes the model with *redacted* in-domain data, and then fine-tunes it again with the *original* in-domain data using a private training mechanism. Furthermore, we study the scenario of imperfect implementation of policy functions that misses sensitive tokens and develop systematic methods to handle it. Experiments show that our method achieves strong utility compared to previous baselines. We also analyze the SDP privacy guarantee empirically with the canary insertion attack.
Anthology ID:
2022.emnlp-main.425
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6327–6340
Language:
URL:
https://aclanthology.org/2022.emnlp-main.425
DOI:
Bibkey:
Cite (ACL):
Weiyan Shi, Ryan Shea, Si Chen, Chiyuan Zhang, Ruoxi Jia, and Zhou Yu. 2022. Just Fine-tune Twice: Selective Differential Privacy for Large Language Models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 6327–6340, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
Just Fine-tune Twice: Selective Differential Privacy for Large Language Models (Shi et al., EMNLP 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.emnlp-main.425.pdf