Knowledge Localization and Editability in Small Language Models: A Multi-Stage Experimental Study

Pranamya Nilesh Deshpande; Aiswarya Konavoor; Sreedath Panat

Knowledge Localization and Editability in Small Language Models: A Multi-Stage Experimental Study

Pranamya Nilesh Deshpande, Aiswarya Konavoor, Sreedath Panat

Abstract

The internal mechanisms by which transformer-based language models encode and retrieve factual knowledge remain poorly understood, particularly for small language models (SLMs) operating in the 2–3 billion parameter range. This paper presents a systematic, multi-stage empirical investigation into knowledge localization, compression effects, and knowledge editability across four SLMs—Gemma-2B, Llama-3.2-3B-Instruct, Qwen-2.5-3B-Instruct, and Phi-2—with Meta-Llama-3-8B serving as a large-model baseline. Stage 1 employs causal tracing with activation patching on the CounterFact dataset (~450–500 validated facts per model) to identify the layer or layers most causally responsible for factual recall. Stage 2 compares knowledge density, layer concentration, and redundancy between the 2–3B models and the 8B baseline to quantify the structural effects of model compression on knowledge storage. Stage 3 applies the Rank-One Model Editing (ROME) algorithm at the causally identified layers to assess whether localized knowledge can be reliably overwritten. Our results demonstrate that (i) factual knowledge in SLMs concentrates in upper-to-final transformer layers, with Llama-3B exhibiting extreme concentration in layer 28; (ii) compressed models store knowledge more densely per parameter but with substantially lower redundancy (Llama-3B: 0.047 vs. Llama-8B: 0.468); and (iii) editing success correlates strongly with architectural concentration rather than model size, with Llama-3B achieving 85.7% editing success versus 33% for Gemma-2B. These findings carry direct implications for interpretability, model editing, and the design of future small language model architectures.

Anthology ID:: 2026.knowfm-1.13
Volume:: Proceedings of the 4th Workshop on Towards Knowledgeable Foundation Models (KnowFM 2026)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Canyu Chen, Yuji Zhang, Zoey Sha Li, Zihan Wang, Qineng Wang, Jinyan Su, Priyanka Kargupta, Sara Vera Marjanović, Jeff Z. Pan, Mohit Bansal, Isabelle Augenstein, Jiawei Han, Heng Ji, Manling Li
Venues:: KnowFM | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 165–172
Language:
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.knowfm-1.13/
DOI:
Bibkey:
Cite (ACL):: Pranamya Nilesh Deshpande, Aiswarya Konavoor, and Sreedath Panat. 2026. Knowledge Localization and Editability in Small Language Models: A Multi-Stage Experimental Study. In Proceedings of the 4th Workshop on Towards Knowledgeable Foundation Models (KnowFM 2026), pages 165–172, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Knowledge Localization and Editability in Small Language Models: A Multi-Stage Experimental Study (Deshpande et al., KnowFM 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.knowfm-1.13.pdf

PDF Cite Search Fix data