Multi-Token Completion for Text Anonymization
Pulkit Madaan, Krithika Ramesh, Lisa Bauer, Charith Peris, Anjalie Field
Abstract
Text anonymization is a critical task for enabling research and development in high-stakes domains containing private data, like medicine, law, and social services. While much research has focused on redacting sensitive content from text, substantially less work has focused on what to replace redacted content with, which can enhance privacy and becomes increasingly important with greater levels of redaction. In this work, we formulate predicting replacements for sensitive spans as a research task with principled use-inspired evaluation criteria. We further propose a multi-token completion method for accomplishing this task that is designed to preserve consistency with low compute requirements, thus facilitating practitioners to anonymize data locally before sharing it externally. Human and automated annotations demonstrate that our approach produces more realistic text and better preserves utility than alternative infilling methods and differentially private mechanisms across multiple domains without retraining. Overall, our work explores the under-studied task of what to replace redacted content with and contributes grounded evaluations capturing utility, facilitating future work.- Anthology ID:
- 2026.eacl-long.276
- Volume:
- Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- March
- Year:
- 2026
- Address:
- Rabat, Morocco
- Editors:
- Vera Demberg, Kentaro Inui, Lluís Marquez
- Venue:
- EACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 5894–5908
- Language:
- URL:
- https://preview.aclanthology.org/ingest-eacl/2026.eacl-long.276/
- DOI:
- Cite (ACL):
- Pulkit Madaan, Krithika Ramesh, Lisa Bauer, Charith Peris, and Anjalie Field. 2026. Multi-Token Completion for Text Anonymization. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5894–5908, Rabat, Morocco. Association for Computational Linguistics.
- Cite (Informal):
- Multi-Token Completion for Text Anonymization (Madaan et al., EACL 2026)
- PDF:
- https://preview.aclanthology.org/ingest-eacl/2026.eacl-long.276.pdf