DanDan Liu


2026

When asked explicitly, a Large language model (LLM) may validate your anger—but implicitly, it may still judge that anger as inappropriate. We call this divergence the endorsement–exposure gap, and it reveals that LLMs encode hidden norms about which emotions are acceptable in which contexts. To measure these norms systematically, we introduce Feeling Rules Atlas, a benchmark of 1,320 vignettes spanning 6 institutional settings, 12 roles, 7 emotions, and 5 intensity levels. We pair the benchmark with two probes: explicit norm judgments (APPROPRIATE/INAPPROPRIATE/DEPENDS) and implicit acceptability scored by log-likelihood contrast. Across six model families, we find large cross-model variation in sanctioning thresholds and institutional "norm signatures" not reducible to overall strictness; models that appear similarly lenient explicitly can diverge sharply in implicit judgments. These results establish normative affect—context-conditioned judgments of emotional appropriateness—as a distinct alignment axis, and motivate transparent profiling of feeling rules for emotionally sensitive deployments.