Being Kind Isn’t Always Being Safe: Diagnosing Affective Hallucination in LLMs

Sewon Kim, Jiwon Kim, SeungWoo Shin, Hyejin Chung, Daeun Moon, Yejin Kwon, Hyunsoo Yoon


Abstract
Large Language Models (LLMs) are increasingly engaged in emotionally vulnerable conversations that extend beyond information seeking to moments of personal distress. As they adopt affective tones and simulate empathy, they risk creating the illusion of genuine relational connection. We term this phenomenon Affective Hallucination, referring to emotionally immersive responses that evoke false social presence despite the model’s lack of affective capacity. To address this, we introduce AHaBench, a benchmark of 500 mental-health-related prompts with expert-informed reference responses, evaluated along three dimensions: Emotional Enmeshment, Illusion of Presence, and Fostering Overdependence. We further release AHaPairs, a 5K-instance preference dataset enabling Direct Preference Optimization (DPO) for alignment with emotionally responsible behavior. DPO fine-tuning substantially reduces affective hallucination without compromising reasoning performance, and the Pearson correlation coefficients between GPT-4o and human judgments is also strong (r=0.85) indicating that human evaluations confirm AHaBench as an effective diagnostic tool. This work establishes affective hallucination as a distinct safety concern and provides resources for developing LLMs that are both factually reliable and psychologically safe. Warning: This paper contains examples of mental health-related language that may be emotionally distressing.
Anthology ID:
2026.findings-eacl.4
Volume:
Findings of the Association for Computational Linguistics: EACL 2026
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
50–78
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.4/
DOI:
Bibkey:
Cite (ACL):
Sewon Kim, Jiwon Kim, SeungWoo Shin, Hyejin Chung, Daeun Moon, Yejin Kwon, and Hyunsoo Yoon. 2026. Being Kind Isn’t Always Being Safe: Diagnosing Affective Hallucination in LLMs. In Findings of the Association for Computational Linguistics: EACL 2026, pages 50–78, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Being Kind Isn’t Always Being Safe: Diagnosing Affective Hallucination in LLMs (Kim et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.4.pdf
Checklist:
 2026.findings-eacl.4.checklist.pdf