LLM Beliefs Are in Their Heads

Alessandro Corona Mendozza, Anders S{\o}gaard


Abstract
We investigate belief-like representations in decoder-only autoregressive LLMs using linear controlled probes on residual stream activations and single attention heads. Following Herrmann and Levinstein’s (2025) criteria (Accuracy, Use, Coherence, and Uniformity) we find that large models exhibit strong truth sensitivity (Accuracy), and steering activations along probe directions reliably changes downstream behavior (Use). Coherence, measured via calibrated probes and cross-dataset probing, is moderate across models, while training on diverse data yields domain-consistent truth directions (Uniformity). The results are particularly encouraging at the head level and align with some standard philosophical accounts of belief, e.g., minimal functionalism, supporting the view that LLMs can maintain propositional attitudes under such theoretical frameworks.
Anthology ID:
2026.acl-long.1905
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
41033–41067
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.1905/
DOI:
Bibkey:
Cite (ACL):
Alessandro Corona Mendozza and Anders S{\o}gaard. 2026. LLM Beliefs Are in Their Heads. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 41033–41067, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
LLM Beliefs Are in Their Heads (Mendozza & S{\o}gaard, ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.1905.pdf
Checklist:
 2026.acl-long.1905.checklist.pdf