SEPS: A Separability Measure for Robust Unlearning in LLMs

Wonje Jeung, Sangyeon Yoon, Albert No


Abstract
Machine unlearning aims to selectively remove targeted knowledge from Large Language Models (LLMs), ensuring they forget specified content while retaining essential information. Existing unlearning metrics assess whether a model correctly answers retain queries and rejects forget queries, but they fail to capture real-world scenarios where forget queries rarely appear in isolation. In fact, forget and retain queries often coexist within the same prompt, making mixed-query evaluation crucial.We introduce SEPS, an evaluation framework that explicitly measures a model’s ability to both forget and retain information within a single prompt. Through extensive experiments across three benchmarks, we identify two key failure modes in existing unlearning methods: (1) untargeted unlearning indiscriminately erases both forget and retain content once a forget query appears, and (2) targeted unlearning overfits to single-query scenarios, leading to catastrophic failures when handling multiple queries. To address these issues, we propose Mixed Prompt (MP) unlearning, a strategy that integrates both forget and retain queries into a unified training objective. Our approach significantly improves unlearning effectiveness, demonstrating robustness even in complex settings with up to eight mixed forget and retain queries in a single prompt.
Anthology ID:
2025.emnlp-main.283
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5556–5587
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.283/
DOI:
Bibkey:
Cite (ACL):
Wonje Jeung, Sangyeon Yoon, and Albert No. 2025. SEPS: A Separability Measure for Robust Unlearning in LLMs. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 5556–5587, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
SEPS: A Separability Measure for Robust Unlearning in LLMs (Jeung et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.283.pdf
Checklist:
 2025.emnlp-main.283.checklist.pdf