Anders S{\o}gaard

2026

LLM Beliefs Are in Their Heads
Alessandro Corona Mendozza | Anders S{\o}gaard
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We investigate belief-like representations in decoder-only autoregressive LLMs using linear controlled probes on residual stream activations and single attention heads. Following Herrmann and Levinstein’s (2025) criteria (Accuracy, Use, Coherence, and Uniformity) we find that large models exhibit strong truth sensitivity (Accuracy), and steering activations along probe directions reliably changes downstream behavior (Use). Coherence, measured via calibrated probes and cross-dataset probing, is moderate across models, while training on diverse data yields domain-consistent truth directions (Uniformity). The results are particularly encouraging at the head level and align with some standard philosophical accounts of belief, e.g., minimal functionalism, supporting the view that LLMs can maintain propositional attitudes under such theoretical frameworks.

pdf bib abs

FrameNet-Cultures: A Benchmark for Evaluating LLMs via Cross-Cultural Frame Semantics
Neda Jamshidi | Anders S{\o}gaard | Monica Bianchini
Findings of the Association for Computational Linguistics: ACL 2026

Large language models (LLMs) exhibit cultural biases, yet existing benchmarks rely on closed-form, domain-specific questionnaires. We introduce FRAMENET-CULTURES, a benchmark for evaluating cultural alignment in LLMs based on Fillmore-style frame semantics. Using the EveryCulture encyclopedia, we construct a lexicon of 18 cultural frames (e.g., greeting,child-rearing) across 20 countries, treating it as a structured reference for comparison rather than a definitive representation of contemporary societies. For each frame, we prompt five major LLMs—ChatGPT-5, Gemini-2.5-Flash, Mistral-Large, Qwen-3-Max, DeepSeek-V3.2—three times to generate open-ended instantiations, which are manually annotated and binarized. We measure alignment with country- and continent-level profiles using normalized Hamming distance, and validate cultural recognizability through human evaluation of generated dialogues. Under culture-neutral prompting, outputs align most closely with European profiles, followed by Asian and American ones, indicating a consistent cross-model pattern. With culture-specific prompting, models shift toward the target regions, aligning most strongly with Africa for Ethiopia and with Asia for India. FRAMENET-CULTURES is the first open-ended benchmark for cultural alignment relying on frame semantics. Data, prompts, and annotations are publicly available at https://github.com/neda-jamshidi/FrameNet-Cultures.

Co-authors

Venues

ACL1
Findings1

Fix author