Abstract
Data privacy is an important issue for “machine learning as a service” providers. We focus on the problem of membership inference attacks: Given a data sample and black-box access to a model’s API, determine whether the sample existed in the model’s training data. Our contribution is an investigation of this problem in the context of sequence-to-sequence models, which are important in applications such as machine translation and video captioning. We define the membership inference problem for sequence generation, provide an open dataset based on state-of-the-art machine translation models, and report initial results on whether these models leak private information against several kinds of membership inference attacks.- Anthology ID:
- 2020.tacl-1.4
- Volume:
- Transactions of the Association for Computational Linguistics, Volume 8
- Month:
- Year:
- 2020
- Address:
- Cambridge, MA
- Editors:
- Mark Johnson, Brian Roark, Ani Nenkova
- Venue:
- TACL
- SIG:
- Publisher:
- MIT Press
- Note:
- Pages:
- 49–63
- Language:
- URL:
- https://aclanthology.org/2020.tacl-1.4
- DOI:
- 10.1162/tacl_a_00299
- Cite (ACL):
- Sorami Hisamoto, Matt Post, and Kevin Duh. 2020. Membership Inference Attacks on Sequence-to-Sequence Models: Is My Data In Your Machine Translation System?. Transactions of the Association for Computational Linguistics, 8:49–63.
- Cite (Informal):
- Membership Inference Attacks on Sequence-to-Sequence Models: Is My Data In Your Machine Translation System? (Hisamoto et al., TACL 2020)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2020.tacl-1.4.pdf
- Code
- sorami/TACL-Membership