Yefeng Yuan


2026

Modern machine learning systems rely heavily on large-scale textual data that often contain sensitive personal information. Although conventional anonymization techniques remove explicit identifiers, textual data remain vulnerable to authorship inference attacks that exploit persistent stylometric signals.Recent approaches leverage Large Language Models (LLMs) to rewrite text and obscure such signals, but they frequently overlook distinctive stylometric outliers and fail to achieve a favorable privacy–utility trade-off due to rigid, one-size-fits-all obfuscation strategies, while also incurring high computational costs.To address these challenges, we propose STAMP-R, a risk-adaptive reinforcement learning framework for instance-level authorship anonymization. We formulate anonymization as a risk-aware, instance-level style distribution shaping problem. Central to our approach is the Style Manifold Memory (SMM), which models the global stylistic landscape via prototype-based density estimation. SMM detects high-risk stylometric outliers and adaptively modulates a composite reward function, enabling stronger obfuscation for highly identifiable samples while preserving semantic fidelity for low-risk instances.We further distill a lightweight 3B-parameter model from a teacher LLM for efficient local deployment. Experiments show that STAMP-R reduces authorship re-identification risk while maintaining strong downstream utility.