Towards Fair and Efficient De-identification: Quantifying the Efficiency and Generalizability of De-identification Approaches

Noopur Zambare; Kiana Aghakasiri; Carissa Lin; Carrie Ye; J Ross Mitchell; Mohamed Abdalla

Towards Fair and Efficient De-identification: Quantifying the Efficiency and Generalizability of De-identification Approaches

Noopur Zambare, Kiana Aghakasiri, Carissa Lin, Carrie Ye, J Ross Mitchell, Mohamed Abdalla

Abstract

Large language models (LLMs) have shown strong performance on clinical de-identification, the task of identifying sensitive identifiers to protect privacy. However, previous work has not examined their generalizability between formats, cultures, and genders. In this work, we systematically evaluate fine-tuned transformer models (BERT, ClinicalBERT, ModernBERT), small LLMs (Llama 1-8B, Qwen 1.5-7B), and large LLMs (Llama-70B, Qwen-72B) at de-identification. We show that smaller models achieve comparable performance while substantially reducing inference cost, making them more practical for deployment. Moreover, we demonstrate that smaller models can be fine-tuned with limited data to outperform larger models in de-identifying identifiers drawn from Mandarin, Hindi, Spanish, French, Bengali, and regional variations of English, in addition to gendered names. To improve robustness in multi-cultural contexts, we introduce and publicly release BERT-MultiCulture-DEID, a set of de-identification models based on BERT, ClinicalBERT, and ModernBERT, fine-tuned on MIMIC with identifiers from multiple language variants. Our findings provide the first comprehensive quantification of the efficiency-generalizability trade-off in de-identification and establish practical pathways for fair and efficient clinical de-identification.Details on accessing the models are available at: https://doi.org/10.5281/zenodo.18342291

Anthology ID:: 2026.findings-eacl.222
Volume:: Findings of the Association for Computational Linguistics: EACL 2026
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4242–4257
Language:
URL:: https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.222/
DOI:
Bibkey:
Cite (ACL):: Noopur Zambare, Kiana Aghakasiri, Carissa Lin, Carrie Ye, J Ross Mitchell, and Mohamed Abdalla. 2026. Towards Fair and Efficient De-identification: Quantifying the Efficiency and Generalizability of De-identification Approaches. In Findings of the Association for Computational Linguistics: EACL 2026, pages 4242–4257, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Towards Fair and Efficient De-identification: Quantifying the Efficiency and Generalizability of De-identification Approaches (Zambare et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.222.pdf
Checklist:: 2026.findings-eacl.222.checklist.pdf

PDF Cite Search Checklist Fix data