A Fully Probabilistic Perspective on Large Language Model Unlearning: Evaluation and Optimization

Anda Cheng; Wei Huang; Yinggui Wang

A Fully Probabilistic Perspective on Large Language Model Unlearning: Evaluation and Optimization

Abstract

Large Language Model Unlearning (LLMU) is a promising way to remove private or sensitive information from large language models. However, the comprehensive evaluation of LLMU remains underexplored. The dominant deterministic evaluation can yield overly optimistic assessments of unlearning efficacy. To mitigate this, we propose a Fully Probabilistic Evaluation (FPE) framework that incorporates input and output distributions in LLMU evaluation. FPE obtains a probabilistic evaluation result by querying unlearned models with various semantically similar inputs and multiple sampling attempts. We introduce an Input Distribution Sampling method in FPE to select high-quality inputs, enabling a stricter measure of information leakage risks. Furthermore, we introduce a Contrastive Embedding Loss (CEL) to advance the performance of LLMU. CEL employs contrastive learning to distance latent representations of unlearned samples from adaptively clustered contrast samples while aligning them with random vectors, leading to improved efficacy and robustness for LLMU. Our experiments show that FPE uncovers more unlearned information leakage risks than prior evaluation methods, and CEL improves unlearning effectiveness by at least 50.1% and robustness by at least 37.2% on Llama-2-7B while retaining high model utility.

Anthology ID:: 2025.emnlp-main.452
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 8943–8954
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.452/
DOI:
Bibkey:
Cite (ACL):: Anda Cheng, Wei Huang, and Yinggui Wang. 2025. A Fully Probabilistic Perspective on Large Language Model Unlearning: Evaluation and Optimization. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 8943–8954, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: A Fully Probabilistic Perspective on Large Language Model Unlearning: Evaluation and Optimization (Cheng et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.452.pdf
Checklist:: 2025.emnlp-main.452.checklist.pdf

PDF Cite Search Checklist Fix data