Where Paths Split: Localized, Calibrated Control of Moral Reasoning in Large Language Models

Chenchen Yuan, Zheyu Zhang, Gjergji Kasneci


Abstract
Large language models often display heterogeneous moral preferences across settings. We study inference-time steering toward a desired ethical framework while preserving general competence. We present Convergent-Divergent Routing, which traces and edits minimal branch points inside transformer blocks where ethical-framework-related pathways first converge and then diverge. Gating non-target branches at these loci blocks the downstream propagation while leaving upstream computations intact. We find that this intervention alone increases targeted ethical-framework reasoning. To achieve fine-grained control, we adapt Common Spatial Patterns to the residual stream and extract, for each branch-point layer, a pair of directions that discriminate between utilitarian and deontological frameworks. We then introduce Dual Logit Calibration, a closed-form, minimum-2-norm update that moves the residual within this two-dimensional subspace so the resulting directional projections align with user-specified preference weights. Experiments on real-life moral dilemmas show that our method reliably achieves preference calibration and largely preserves general capabilities, outperforming recent baselines while providing an interpretable mechanism.
Anthology ID:
2026.acl-long.1933
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
41698–41721
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.1933/
DOI:
Bibkey:
Cite (ACL):
Chenchen Yuan, Zheyu Zhang, and Gjergji Kasneci. 2026. Where Paths Split: Localized, Calibrated Control of Moral Reasoning in Large Language Models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 41698–41721, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Where Paths Split: Localized, Calibrated Control of Moral Reasoning in Large Language Models (Yuan et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.1933.pdf
Checklist:
 2026.acl-long.1933.checklist.pdf