Unraveling Interwoven Roles of Large Language Models in Authorship Privacy: Obfuscation, Mimicking, and Verification

Tuc Nguyen; Yifan Hu; Thai Le

Unraveling Interwoven Roles of Large Language Models in Authorship Privacy: Obfuscation, Mimicking, and Verification

Abstract

Recent advancements in large language models (LLMs) have been fueled by large-scale training corpora drawn from diverse sources such as websites, news articles, and books. These datasets often contain explicit user information, such as person names, addresses, that LLMs may unintentionally reproduce in their generated outputs. Beyond such explicit content, LLMs can also leak identity-revealing cues through implicit signals such as distinctive writing styles, raising significant concerns about authorship privacy. There are three major automated tasks in authorship privacy, namely authorship obfuscation (AO), authorship mimicking (AM), and authorship verification (AV). Prior research has studied AO, AM, and AV independently. However, their interplays remain under-explored, which leaves a major research gap, especially in the era of LLMs, where they are profoundly shaping how we curate and share user-generated content, and the distinction between machine‐generated and human‐authored text is also increasingly blurred. This work then presents the first unified framework for analyzing the dynamic relationships among LLM-enabled AO, AM, and AV in the context of authorship privacy. We quantify how they interact with each other to transform human‐authored text, examining effects at a single point in time and iteratively over time. We also examine the role of demographic metadata, such as gender, academic background, in modulating their performances, inter-task dynamics, and privacy risks. The code is available at https://github.com/nguyentuc/authorship_privacy.

Anthology ID:: 2025.emnlp-main.753
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 14914–14930
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.753/
DOI:
Bibkey:
Cite (ACL):: Tuc Nguyen, Yifan Hu, and Thai Le. 2025. Unraveling Interwoven Roles of Large Language Models in Authorship Privacy: Obfuscation, Mimicking, and Verification. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 14914–14930, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Unraveling Interwoven Roles of Large Language Models in Authorship Privacy: Obfuscation, Mimicking, and Verification (Nguyen et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.753.pdf
Checklist:: 2025.emnlp-main.753.checklist.pdf

PDF Cite Search Checklist Fix data