Defeating Cerberus: Privacy-Leakage Mitigation in Vision Language Models

Boyang Zhang; Istemi Ekin Akkus; Ruichuan Chen; Alice Dethise; Klaus Satzke; Ivica Rimac; Yang Zhang

Defeating Cerberus: Privacy-Leakage Mitigation in Vision Language Models

Boyang Zhang, Istemi Ekin Akkus, Ruichuan Chen, Alice Dethise, Klaus Satzke, Ivica Rimac, Yang Zhang

Abstract

Vision Language Models (VLMs) have demonstrated remarkable capabilities in processing multimodal data, but their advanced abilities also raise significant privacy concerns, particularly regarding Personally Identifiable Information (PII) leakage. While relevant research has been conducted on single-modal language models to some extent, the vulnerabilities in the multimodal setting have yet to be fully investigated. Our work assesses these emerging risks and introduces a concept-guided mitigation approach. By identifying and modifying the model’s internal states associated with PII-related content, our method guides VLMs to refuse PII-sensitive tasks effectively and efficiently, without requiring re-training or fine-tuning. We also address the current lack of multimodal PII datasets by constructing various ones that simulate real-world scenarios. Experimental results demonstrate the method can achieve on average 93.3% refusal rate for various PII-related tasks with minimal impact on unrelated model performances. We further examine the mitigation’s performance under various conditions to show the adaptability of our proposed method.

Anthology ID:: 2026.findings-eacl.154
Volume:: Findings of the Association for Computational Linguistics: EACL 2026
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2952–2965
Language:
URL:: https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.154/
DOI:
Bibkey:
Cite (ACL):: Boyang Zhang, Istemi Ekin Akkus, Ruichuan Chen, Alice Dethise, Klaus Satzke, Ivica Rimac, and Yang Zhang. 2026. Defeating Cerberus: Privacy-Leakage Mitigation in Vision Language Models. In Findings of the Association for Computational Linguistics: EACL 2026, pages 2952–2965, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Defeating Cerberus: Privacy-Leakage Mitigation in Vision Language Models (Zhang et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.154.pdf
Checklist:: 2026.findings-eacl.154.checklist.pdf

PDF Cite Search Checklist Fix data