The Model’s Language Matters: A Comparative Privacy Analysis of LLMs

Abhishek Kumar Mishra; Antoine Boutet; Lucas Magnana

The Model’s Language Matters: A Comparative Privacy Analysis of LLMs

Abhishek Kumar Mishra, Antoine Boutet, Lucas Magnana

Abstract

Large Language Models (LLMs) are increasingly deployed in multilingual settings that process sensitive data, yet their scale and linguistic variability can amplify privacy risks. While prior privacy evaluations focus predominantly on English, we investigate how language structure shapes privacy leakage in LLMs trained on English, Spanish, French, and Italian medical corpora. We quantify six corpus-level linguistic indicators and evaluate vulnerability under three attack families: extraction, counterfactual memorization, and membership inference. Across languages, we find that leakage systematically tracks structural properties: Italian exhibits the strongest exposure, consistent with its highest redundancy and longer lexical units, whereas English shows the clearest membership separability, aligning with its higher syntactic entropy and stronger surface-identifiable cues. In contrast, French and Spanish remain comparatively more resilient overall, aided by higher morphological complexity. These results provide quantitative evidence that language matters for privacy leakage, motivating language-aware and structure-adaptive privacy-preserving mechanisms for multilingual LLM deployments.

Anthology ID:: 2026.findings-eacl.54
Volume:: Findings of the Association for Computational Linguistics: EACL 2026
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1038–1048
Language:
URL:: https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.54/
DOI:
Bibkey:
Cite (ACL):: Abhishek Kumar Mishra, Antoine Boutet, and Lucas Magnana. 2026. The Model’s Language Matters: A Comparative Privacy Analysis of LLMs. In Findings of the Association for Computational Linguistics: EACL 2026, pages 1038–1048, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: The Model’s Language Matters: A Comparative Privacy Analysis of LLMs (Mishra et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.54.pdf
Checklist:: 2026.findings-eacl.54.checklist.pdf

PDF Cite Search Checklist Fix data