Qianshan Wei


2025

pdf bib
Forget the Token and Pixel: Rethinking Gradient Ascent for Concept Unlearning in Multimodal Generative Models
Jiaqi Li | Chuanyi Zhang | Miaozeng Du | Hui Zhang | Yongrui Chen | Qianshan Wei | Junfeng Fang | Ruipeng Wang | Sheng Bi | Guilin Qi
Findings of the Association for Computational Linguistics: ACL 2025

Gradient Ascent (GA) has emerged as a promising approach for concept unlearning in Multimodal Generative Models (MGMs), such as Multimodal Large Language Models (MLLMs) and Stable Diffusion Models (SDMs). Despite its effectiveness in removing undesired knowledge, GA leads to severe utility degradation in MGMs. In this paper, we explore the mechanism behind this degradation by quantifying two distinct forms of knowledge in MGMs: (i) Conceptual Knowledge, which represents specific information about concepts; (ii) Natural Knowledge, which refers to the ability to produce coherent and logically structured outputs. Our analysis reveals that applying GA globally not only removes the targeted Conceptual Knowledge but also inadvertently diminishes Natural Knowledge, resulting in utility collapse. To address this issue, we propose Forget the Token and Pixel (FTTP), a novel approach that selectively applies GA to targeted Conceptual Knowledge while preserving Natural Knowledge through Gradient Descent (GD). FTTP eliminates the need for additional retain sets and a large number of training steps, thereby reducing computational resource costs. Extensive experiments demonstrate FTTP’s efficiency and superior utility-unlearning tradeoff for both text and image generation tasks. Our source code will be released in the near future.