Personal Information Parroting in Language Models

Nishant Subramani, Kshitish Ghate, Mona T. Diab


Abstract
Modern language models (LM) are trained on large scrapes of the Web, containing millions of personal information (PI) instances, many of which LMs memorize, increasing privacy risks. In this work, we develop the regexes and rules (R R) detector suite to detect email addresses, phone numbers, and IP addresses, which outperforms the best regex-based PI detectors. On a manually curated set of 483 instances of PI, we measure memorization: finding that 13.6% are parroted verbatim by the Pythia-6.9b model, i.e., when the model is prompted with the tokens that precede the PI in the original document, greedy decoding generates the entire PI span exactly. We expand this analysis to study models of varying sizes (160M-6.9B) and pretraining time steps (70k-143k iterations) in the Pythia model suite and find that both model size and amount of pretraining are positively correlated with memorization. Even the smallest model, Pythia-160m, parrots 2.7% of the instances exactly. Consequently, we strongly recommend that pretraining datasets be aggressively filtered and anonymized to minimize PI parroting.
Anthology ID:
2026.findings-eacl.45
Volume:
Findings of the Association for Computational Linguistics: EACL 2026
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
886–895
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.45/
DOI:
Bibkey:
Cite (ACL):
Nishant Subramani, Kshitish Ghate, and Mona T. Diab. 2026. Personal Information Parroting in Language Models. In Findings of the Association for Computational Linguistics: EACL 2026, pages 886–895, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Personal Information Parroting in Language Models (Subramani et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.45.pdf
Checklist:
 2026.findings-eacl.45.checklist.pdf