Baoyi An


2026

Large language models exhibit significant potential for psychological support, yet they often generate fragmented and emotionally inconsistent dialogues that lack the therapeutic structure necessary for reliable assessment.To address these issues, we introduce **VeilEval**, a clinically grounded and privacy-preserving benchmark equipped with interpretable metrics for evaluating multi-turn psychological dialogues.Furthermore, we propose Emotion-Resonance (**EmoRes**), a multi-agent framework that boosts psychological reasoning via a Topic-Mining Emotional Agent and a multi-perspective Self-Reflection Agent, thereby jointly improving topic continuity, emotional coherence, and clinical interpretability.Experiments demonstrate that EmoRes achieves up to ∼ 3× improvement over strong baselines on VeilEval, with its effectiveness further validated by ablation studies and human evaluations.