Direct Token Optimization: A Self-Contained Approach to Large Language Model Unlearning

Hong Kyu Lee, Ruixuan Liu, Li Xiong


Abstract
Machine unlearning is an emerging technique that removes the influence of a subset of training data (forget set) from a model without full retraining, with applications including privacy protection, content moderation, and model correction. The key challenge lies in achieving strong unlearning efficacy while preserving the overall utility. Existing unlearning methods for large language models (LLMs) often rely on auxiliary models, retain datasets, or even commercial AI services. However, dependence on these external resources is often impractical and could potentially introduce additional privacy risks. In this work, we propose direct token optimization (DTO), a self-contained unlearning approach for LLMs that directly optimizes the token-level objectives to unlearn specific sequences without external resources.For each sequence to be unlearned, we identify target tokens that encode critical knowledge for unlearning and treat remaining tokens as non-target ones for maintaining the model utility. DTO maximizes an unlearning objective on target tokens and applies a utility-preservation regularizer on non-target tokens.Across multiple unlearning benchmarks, DTO improves the forget quality up to 16.8× over the latest baselines while maintaining comparable model utility. Our code is available at github.com/Emory-AIMS/direct_token_optimization.
Anthology ID:
2026.findings-acl.2088
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
42083–42100
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.2088/
DOI:
Bibkey:
Cite (ACL):
Hong Kyu Lee, Ruixuan Liu, and Li Xiong. 2026. Direct Token Optimization: A Self-Contained Approach to Large Language Model Unlearning. In Findings of the Association for Computational Linguistics: ACL 2026, pages 42083–42100, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Direct Token Optimization: A Self-Contained Approach to Large Language Model Unlearning (Lee et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.2088.pdf
Checklist:
 2026.findings-acl.2088.checklist.pdf