Side Effects of Erasing Concepts from Diffusion Models

Shaswati Saha; Sourajit Saha; Manas Gaur; Tejas Gokhale

doi:10.18653/v1/2025.findings-emnlp.810

Side Effects of Erasing Concepts from Diffusion Models

Shaswati Saha, Sourajit Saha, Manas Gaur, Tejas Gokhale

Abstract

Concerns about text-to-image (T2I) generative models infringing on privacy, copyright, and safety have led to the development of concept erasure techniques (CETs). The goal of an effective CET is to prohibit the generation of undesired “target” concepts specified by the user, while preserving the ability to synthesize high-quality images of other concepts. In this work, we demonstrate that concept erasure has side effects and CETs can be easily circumvented. For a comprehensive measurement of the robustness of CETs, we present the Side Effect Evaluation (SEE) benchmark that consists of hierarchical and compositional prompts describing objects and their attributes. The dataset and an automated evaluation pipeline quantify side effects of CETs across three aspects: impact on neighboring concepts, evasion of targets, and attribute leakage. Our experiments reveal that CETs can be circumvented by using superclass-subclass hierarchy, semantically similar prompts, and compositional variants of the target. We show that CETs suffer from attribute leakage and a counterintuitive phenomenon of attention concentration or dispersal. We release our benchmark and evaluation tools to aid future work on robust concept erasure.

Anthology ID:: 2025.findings-emnlp.810
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 14991–15007
Language:
URL:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.810/
DOI:: 10.18653/v1/2025.findings-emnlp.810
Bibkey:
Cite (ACL):: Shaswati Saha, Sourajit Saha, Manas Gaur, and Tejas Gokhale. 2025. Side Effects of Erasing Concepts from Diffusion Models. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 14991–15007, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Side Effects of Erasing Concepts from Diffusion Models (Saha et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.810.pdf
Checklist:: 2025.findings-emnlp.810.checklist.pdf

PDF Cite Search Checklist Fix data