Tianyu Chen


2022

pdf
THE-X: Privacy-Preserving Transformer Inference with Homomorphic Encryption
Tianyu Chen | Hangbo Bao | Shaohan Huang | Li Dong | Binxing Jiao | Daxin Jiang | Haoyi Zhou | Jianxin Li | Furu Wei
Findings of the Association for Computational Linguistics: ACL 2022

As more and more pre-trained language models adopt on-cloud deployment, the privacy issues grow quickly, mainly for the exposure of plain-text user data (e.g., search history, medical record, bank account). Privacy-preserving inference of transformer models is on the demand of cloud service users. To protect privacy, it is an attractive choice to compute only with ciphertext in homomorphic encryption (HE). However, enabling pre-trained models inference on ciphertext data is difficult due to the complex computations in transformer blocks, which are not supported by current HE tools yet. In this work, we introduce THE-X, an approximation approach for transformers, which enables privacy-preserving inference of pre-trained models developed by popular frameworks. THE-X proposes a workflow to deal with complex computation in transformer networks, including all the non-polynomial functions like GELU, softmax, and LayerNorm. Experiments reveal our proposed THE-X can enable transformer inference on encrypted data for different downstream tasks, all with negligible performance drop but enjoying the theory-guaranteed privacy-preserving advantage.

2021

pdf bib
Pseudo-Label Guided Unsupervised Domain Adaptation of Contextual Embeddings
Tianyu Chen | Shaohan Huang | Furu Wei | Jianxin Li
Proceedings of the Second Workshop on Domain Adaptation for NLP

Contextual embedding models such as BERT can be easily fine-tuned on labeled samples to create a state-of-the-art model for many downstream tasks. However, the fine-tuned BERT model suffers considerably from unlabeled data when applied to a different domain. In unsupervised domain adaptation, we aim to train a model that works well on a target domain when provided with labeled source samples and unlabeled target samples. In this paper, we propose a pseudo-label guided method for unsupervised domain adaptation. Two models are fine-tuned on labeled source samples as pseudo labeling models. To learn representations for the target domain, one of those models is adapted by masked language modeling from the target domain. Then those models are used to assign pseudo-labels to target samples. We train the final model with those samples. We evaluate our method on named entity segmentation and sentiment analysis tasks. These experiments show that our approach outperforms baseline methods.