CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP
Tianyu Yang, Lisen Dai, Xiangqi Wang, Minhao Cheng, Yapeng Tian, Xiangliang Zhang
Abstract
Machine unlearning (MU) has gained significant attention as a means to remove the influence of specific data from a trained model without requiring full retraining. While progress has been made in unimodal domains like text and image classification, unlearning in multimodal models remains relatively under-explored. In this work, we address the unique challenges of unlearning in CLIP, a prominent multimodal model that aligns visual and textual representations. We introduce CLIPErase, a novel approach that disentangles and selectively forgets both visual and textual associations, ensuring that unlearning does not compromise model performance.CLIPErase consists of three key modules: a Forgetting Module that disrupts the associations in the forget set, a Retention Module that preserves performance on the retain set, and a Consistency Module that maintains consistency with the original model. Extensive experiments on CIFAR-100, Flickr30K, and Conceptual 12M across five CLIP downstream tasks, as well as an evaluation on diffusion models, demonstrate that CLIPErase effectively removes designated associations from multimodal samples in downstream tasks, while preserving the model’s performance on the retain set after unlearning.- Anthology ID:
- 2025.acl-long.1469
- Volume:
- Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- July
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 30438–30452
- Language:
- URL:
- https://preview.aclanthology.org/landing_page/2025.acl-long.1469/
- DOI:
- Cite (ACL):
- Tianyu Yang, Lisen Dai, Xiangqi Wang, Minhao Cheng, Yapeng Tian, and Xiangliang Zhang. 2025. CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 30438–30452, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP (Yang et al., ACL 2025)
- PDF:
- https://preview.aclanthology.org/landing_page/2025.acl-long.1469.pdf