Understanding How Value Neurons Shape the Generation of Specified Values in LLMs

Yi Su, Jiayi Zhang, Shu Yang, Xinhai Wang, Lijie Hu, Di Wang


Abstract
Rapid integration of large language models (LLMs) into societal applications has intensified concerns about their alignment with universal ethical principles, as their internal value representations remain opaque despite behavioral alignment advancements. Current approaches struggle to systematically interpret how values are encoded in neural architectures, limited by datasets that prioritize superficial judgments over mechanistic analysis. We introduce ValueLocate, a mechanistic interpretability framework grounded in the Schwartz Values Survey, to address this gap. Our method first constructs ValueInsight, a dataset that operationalizes four dimensions of universal value through behavioral contexts in the real world. Leveraging this dataset, we develop a neuron identification method that calculates activation differences between opposing value aspects, enabling precise localization of value-critical neurons without relying on computationally intensive attribution methods. Our proposed validation method demonstrates that targeted manipulation of these neurons effectively alters model value orientations, establishing causal relationships between neurons and value representations. This work advances the foundation for value alignment by bridging psychological value frameworks with neuron analysis in LLMs.
Anthology ID:
2025.findings-emnlp.501
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9433–9452
Language:
URL:
https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.501/
DOI:
10.18653/v1/2025.findings-emnlp.501
Bibkey:
Cite (ACL):
Yi Su, Jiayi Zhang, Shu Yang, Xinhai Wang, Lijie Hu, and Di Wang. 2025. Understanding How Value Neurons Shape the Generation of Specified Values in LLMs. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 9433–9452, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Understanding How Value Neurons Shape the Generation of Specified Values in LLMs (Su et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.501.pdf
Checklist:
 2025.findings-emnlp.501.checklist.pdf