Adversarial Authorship Attribution for Deobfuscation

Wanyue Zhai, Jonathan Rusert, Zubair Shafiq, Padmini Srinivasan


Abstract
Recent advances in natural language processing have enabled powerful privacy-invasive authorship attribution. To counter authorship attribution, researchers have proposed a variety of rule-based and learning-based text obfuscation approaches. However, existing authorship obfuscation approaches do not consider the adversarial threat model. Specifically, they are not evaluated against adversarially trained authorship attributors that are aware of potential obfuscation. To fill this gap, we investigate the problem of adversarial authorship attribution for deobfuscation. We show that adversarially trained authorship attributors are able to degrade the effectiveness of existing obfuscators from 20-30% to 5-10%. We also evaluate the effectiveness of adversarial training when the attributor makes incorrect assumptions about whether and which obfuscator was used. While there is a a clear degradation in attribution accuracy, it is noteworthy that this degradation is still at or above the attribution accuracy of the attributor that is not adversarially trained at all. Our results motivate the need to develop authorship obfuscation approaches that are resistant to deobfuscation.
Anthology ID:
2022.acl-long.509
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7372–7384
Language:
URL:
https://aclanthology.org/2022.acl-long.509
DOI:
10.18653/v1/2022.acl-long.509
Bibkey:
Cite (ACL):
Wanyue Zhai, Jonathan Rusert, Zubair Shafiq, and Padmini Srinivasan. 2022. Adversarial Authorship Attribution for Deobfuscation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7372–7384, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Adversarial Authorship Attribution for Deobfuscation (Zhai et al., ACL 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl24-info/2022.acl-long.509.pdf
Video:
 https://preview.aclanthology.org/naacl24-info/2022.acl-long.509.mp4
Code
 reginazhai/authorship-deobfuscation