Powerful Training-Free Membership Inference Against Fine-Tuned Autoregressive Language Models

David Ilić; David Stanojević; Kostadin Cvejoski

Powerful Training-Free Membership Inference Against Fine-Tuned Autoregressive Language Models

David Ilić, David Stanojević, Kostadin Cvejoski

Abstract

Fine-tuned language models pose significant privacy risks, as they may memorize and expose sensitive information from their training data. Membership inference attacks (MIAs) provide a principled framework for auditing these risks, yet existing methods achieve limited detection rates, particularly at the low false-positive thresholds required for practical privacy auditing. We present EZ-MIA, a membership inference attack that exploits a key observation: memorization manifests most strongly at error positions, specifically tokens where the model predicts incorrectly yet still shows elevated probability for training examples. We introduce the Error Zone (EZ) score, which measures the directional imbalance of probability shifts at error positions relative to a pretrained reference model. This principled statistic requires only two forward passes per query and no model training of any kind. On WikiText with GPT-2, EZ-MIA achieves 3.8× higher detection than the previous state-of-the-art under identical conditions (66.3% versus 17.5% true positive rate at 1% false positive rate), with near-perfect discrimination (AUC 0.98). At the stringent 0.1% FPR threshold critical for real-world auditing, we achieve 8× higher detection than prior work (14.0% versus 1.8%), requiring no reference model training. These gains extend to larger architectures: on AG News with Llama-2-7B, we achieve 3× higher detection (46.7% versus 15.8% TPR at 1% FPR). These results establish that privacy risks of fine-tuned language models are substantially greater than previously understood, with implications for both privacy auditing and deployment decisions. Code is available at https://github.com/JetBrains-Research/ez-mia.

Anthology ID:: 2026.acl-long.640
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 14077–14093
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.640/
DOI:
Bibkey:
Cite (ACL):: David Ilić, David Stanojević, and Kostadin Cvejoski. 2026. Powerful Training-Free Membership Inference Against Fine-Tuned Autoregressive Language Models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14077–14093, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Powerful Training-Free Membership Inference Against Fine-Tuned Autoregressive Language Models (Ilić et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.640.pdf
Checklist:: 2026.acl-long.640.checklist.pdf

PDF Cite Search Checklist Fix data