How Value Induction Reshapes LLM Behavior

Arnav Arora; Natalie Schluter; Katherine Metcalf; Maartje Ter Hoeve

How Value Induction Reshapes LLM Behavior

Arnav Arora, Natalie Schluter, Katherine Metcalf, Maartje Ter Hoeve

Abstract

Conversational Large Language Models are post-trained on language that expresses specific behavioural traits, such as curiosity, open-mindedness, and empathy, and values, such as helpfulness, harmlessness, and honesty. This is done to increase utility, ensure safety, and improve the user experience of the people interacting with the model. However, values are complex and inter-related - incorporating one can modify behaviour on another. Further, incorporating certain values can make models more addictive or sycophantic, potentially having a detrimental effect on the user interacting with it. We investigate these and other unintended effects of value incorporation into models. We fine-tune models using value subsets of existing preference datasets, measuring the effect of value induction of 15 values on safety, anthropomorphism, and various QA benchmarks. We find that i) inducing values also leads to expression of other related, and sometimes contrastive values, ii) inducing positive values increases safety, and iii) all values increase anthropomorphic language use by models, making them more validating and sycophantic.

Anthology ID:: 2026.findings-acl.1302
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 26131–26152
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1302/
DOI:
Bibkey:
Cite (ACL):: Arnav Arora, Natalie Schluter, Katherine Metcalf, and Maartje Ter Hoeve. 2026. How Value Induction Reshapes LLM Behavior. In Findings of the Association for Computational Linguistics: ACL 2026, pages 26131–26152, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: How Value Induction Reshapes LLM Behavior (Arora et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1302.pdf
Checklist:: 2026.findings-acl.1302.checklist.pdf

PDF Cite Search Checklist Fix data