Diagnosing Hidden Instabilities in Model Editing via Uncertainty Quantification

Zihan Gu; TianYi Zhang; Xinyan Zhang; Zhiyuan Wang; Han Zhang; Yuhao Wei; Jiacheng Lu; Tianyi Ma; Xingsheng Zhang; Hua Zhang; Yue Hu (胡月)

Diagnosing Hidden Instabilities in Model Editing via Uncertainty Quantification

Zihan Gu, TianYi Zhang, Xinyan Zhang, Zhiyuan Wang, Han Zhang, Yuhao Wei, Jiacheng Lu, Tianyi Ma, Xingsheng Zhang, Hua Zhang, Yue Hu

Abstract

Model editing provides a promising mechanism for updating large language models (LLMs) without expensive retraining. Existing approaches, particularly locate-and-edit methods based on least-squares optimization, aim to introduce targeted knowledge changes while preserving pre-trained behavior. In this work, we show that this objective is fundamentally fragile under standard single-edit evaluation protocols. We first develop a unified theoretical framework that characterizes activation-based editing as a constrained intervention on intermediate representations. Within this framework, we demonstrate that least-squares edits cannot, in general, isolate target updates from unrelated activations, giving rise to unavoidable interference that accumulates with successive edits. Crucially, this degradation can remain undetected in single-edit settings when assessed using conventional success and locality metrics. To expose such hidden instabilities, we introduce an uncertainty-based evaluation protocol that combines structured semantic perturbations with uncertainty quantification based on Sampling with Perturbation for UQ. By measuring edit-induced growth in aleatoric and epistemic uncertainty, our method reveals local knowledge conflicts that are invisible to existing benchmarks. Extensive experiments across multiple models, datasets, and editing algorithms show that both least-squares and other parameter-update-based methods consistently increase post-edit uncertainty. Together, our results suggest that current evaluation practices substantially overestimate the reliability of single-edit model editing, and that uncertainty-based diagnostics are necessary for assessing edit stability.

Anthology ID:: 2026.acl-long.1502
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 32544–32566
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1502/
DOI:
Bibkey:
Cite (ACL):: Zihan Gu, TianYi Zhang, Xinyan Zhang, Zhiyuan Wang, Han Zhang, Yuhao Wei, Jiacheng Lu, Tianyi Ma, Xingsheng Zhang, Hua Zhang, and Yue Hu. 2026. Diagnosing Hidden Instabilities in Model Editing via Uncertainty Quantification. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 32544–32566, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Diagnosing Hidden Instabilities in Model Editing via Uncertainty Quantification (Gu et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1502.pdf
Checklist:: 2026.acl-long.1502.checklist.pdf

PDF Cite Search Checklist Fix data