CheckMIABench: Firm Foundations For Membership Inference Attacks on Language Models

Jeffrey George Wang; Jason Wang; Marvin Li; Seth Neel

CheckMIABench: Firm Foundations For Membership Inference Attacks on Language Models

Jeffrey George Wang, Jason Wang, Marvin Li, Seth Neel

Abstract

Membership inference attacks (MIAs) are a canonical way to assess a machine learning model’s privacy properties. Although several attempts have been made to evaluate MIAs on language models, the extant literature has suffered numerous difficulties in constructing clean evaluations to test new techniques. In particular, subtle distribution shifts between member and non-member sets can undermine the statistical validity of MIAs; recent work has underscored this by showing that “blind” methods with no access to the underlying model can perform far better than published methods on the same benchmarks. This paper constructs a benchmark for principled evaluation of MIAs against LLMs, by leveraging the insight that training data before and after a fixed point during training are drawn from the same distribution. Therefore, all open-source models with intermediate checkpoints and public training data can be converted into MIA testbeds. We apply our framework to a half-dozen published attacks on the Pythia and OLMo family of models, from 70M to 7B parameters. To facilitate further privacy research, we open-source a modular library for designing and implementing attacks in this setting: https://github.com/safr-ai-lab/pandora_llm.

Anthology ID:: 2026.acl-short.30
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 364–370
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-short.30/
DOI:
Bibkey:
Cite (ACL):: Jeffrey George Wang, Jason Wang, Marvin Li, and Seth Neel. 2026. CheckMIABench: Firm Foundations For Membership Inference Attacks on Language Models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 364–370, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: CheckMIABench: Firm Foundations For Membership Inference Attacks on Language Models (Wang et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-short.30.pdf
Checklist:: 2026.acl-short.30.checklist.pdf

PDF Cite Search Checklist Fix data