Neuron Empirical Gradient: Discovering and Quantifying Neurons’ Global Linear Controllability

Wayne Xin Zhao; Zehui Jiang; Naoki Yoshinaga

Neuron Empirical Gradient: Discovering and Quantifying Neurons’ Global Linear Controllability

Abstract

While feed-forward neurons in pre-trained language models (PLMs) can encode knowledge, past research targeted a small subset of neurons that heavily influence outputs.This leaves the broader role of neuron activations unclear, limiting progress in areas like knowledge editing.We uncover a global linear relationship between neuron activations and outputs using neuron interventions on a knowledge probing dataset.The gradient of this linear relationship, which we call the **neuron empirical gradient (NEG)**, captures how changes in activations affect predictions.To compute NEG efficiently, we propose **NeurGrad**, enabling large-scale analysis of neuron behavior in PLMs.We also show that NEG effectively captures language skills across diverse prompts through skill neuron probing. Experiments on **MCEval8k**, a multi-genre multiple-choice knowledge benchmark, support NEG’s ability to represent model knowledge. Further analysis highlights the key properties of NEG-based skill representation: efficiency, robustness, flexibility, and interdependency.Code and data are released.

Anthology ID:: 2025.acl-long.1041
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 21446–21477
Language:
URL:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1041/
DOI:
Bibkey:
Cite (ACL):: Xin Zhao, Zehui Jiang, and Naoki Yoshinaga. 2025. Neuron Empirical Gradient: Discovering and Quantifying Neurons’ Global Linear Controllability. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 21446–21477, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Neuron Empirical Gradient: Discovering and Quantifying Neurons’ Global Linear Controllability (Zhao et al., ACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1041.pdf

PDF Cite Search Fix data