Role-Sensitive Neurons: A Neuron-Level Gain Control Mechanism for Confidence Steering

Peiwen Huang; Chih-Hao Hsu; Tzu-Hung Huang; Shou-De Lin

Role-Sensitive Neurons: A Neuron-Level Gain Control Mechanism for Confidence Steering

Peiwen Huang, Chih-Hao Hsu, Tzu-Hung Huang, Shou-De Lin

Abstract

Role-playing prompts effectively steer Large Language Models (LLMs), yet the neural mechanism driving this behavioral shift remains unclear. In this work, we identify Role-Sensitive Neurons (RSNs)—a sparse sub-network (≈ 0.5% of all neurons) governing the transition from hesitation to action. Using a novel evaluation framework with explicit abstention (MMLU-E), we reveal a Confidence-Performance Decoupling: roles primarily modulate the model’s probabilistic "willingness to act" rather than its underlying knowledge representation. We demonstrate that RSNs function as a mechanistic gain control system: causal intervention on this subspace allows precise regulation of abstention behavior. Furthermore, cross-model transfer experiments confirm that these circuits are indigenous to pre-training, with Instruction Tuning (SFT) acting merely as a "signal sharpener" to refine latent gain dynamics. Finally, we identify a critical safety boundary: in knowledge-deficient models, amplifying RSNs induces "unwarranted certainty," highlighting decisiveness as a tunable gain parameter distinct from epistemic truth.

Anthology ID:: 2026.findings-acl.294
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5924–5944
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.294/
DOI:
Bibkey:
Cite (ACL):: Peiwen Huang, Chih-Hao Hsu, Tzu-Hung Huang, and Shou-De Lin. 2026. Role-Sensitive Neurons: A Neuron-Level Gain Control Mechanism for Confidence Steering. In Findings of the Association for Computational Linguistics: ACL 2026, pages 5924–5944, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Role-Sensitive Neurons: A Neuron-Level Gain Control Mechanism for Confidence Steering (Huang et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.294.pdf
Checklist:: 2026.findings-acl.294.checklist.pdf

PDF Cite Search Checklist Fix data