Hong Kyu Lee

2026

Direct Token Optimization: A Self-Contained Approach to Large Language Model Unlearning
Hong Kyu Lee | Ruixuan Liu | Li Xiong
Findings of the Association for Computational Linguistics: ACL 2026

Machine unlearning is an emerging technique that removes the influence of a subset of training data (forget set) from a model without full retraining, with applications including privacy protection, content moderation, and model correction. The key challenge lies in achieving strong unlearning efficacy while preserving the overall utility. Existing unlearning methods for large language models (LLMs) often rely on auxiliary models, retain datasets, or even commercial AI services. However, dependence on these external resources is often impractical and could potentially introduce additional privacy risks. In this work, we propose direct token optimization (DTO), a self-contained unlearning approach for LLMs that directly optimizes the token-level objectives to unlearn specific sequences without external resources.For each sequence to be unlearned, we identify target tokens that encode critical knowledge for unlearning and treat remaining tokens as non-target ones for maintaining the model utility. DTO maximizes an unlearning objective on target tokens and applies a utility-preservation regularizer on non-target tokens.Across multiple unlearning benchmarks, DTO improves the forget quality up to 16.8× over the latest baselines while maintaining comparable model utility. Our code is available at github.com/Emory-AIMS/direct_token_optimization.

Co-authors

Ruixuan Liu 1
Li Xiong 1

Venues

Findings1

Fix author