Yihan Shi
2025
SocialEval: Evaluating Social Intelligence of Large Language Models
Jinfeng Zhou
|
Yuxuan Chen
|
Yihan Shi
|
Xuanming Zhang
|
Leqi Lei
|
Yi Feng
|
Zexuan Xiong
|
Miao Yan
|
Xunzhi Wang
|
Yaru Cao
|
Jianing Yin
|
Shuai Wang
|
Quanyu Dai
|
Zhenhua Dong
|
Hongning Wang
|
Minlie Huang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
LLMs exhibit promising Social Intelligence (SI) in modeling human behavior, raising the need to evaluate LLMs’ SI and their discrepancy with humans. SI equips humans with interpersonal abilities to behave wisely in navigating social interactions to achieve social goals. This presents an operational evaluation paradigm: outcome-oriented goal achievement evaluation and process-oriented interpersonal ability evaluation, which existing work fails to address. To this end, we propose SocialEval, a script-based bilingual SI benchmark, integrating outcome- and process-oriented evaluation by manually crafting narrative scripts. Each script is structured as a world tree that contains plot lines driven by interpersonal ability, providing a comprehensive view of how LLMs navigate social interactions. Experiments show that LLMs fall behind humans on both SI evaluations, exhibit prosociality, and prefer more positive social behaviors, even if they lead to goal failure. Analysis of LLMs’ formed representation space and neuronal activations reveals that LLMs have developed ability-specific functional partitions akin to the human brain.
Crisp: Cognitive Restructuring of Negative Thoughts through Multi-turn Supportive Dialogues
Jinfeng Zhou
|
Yuxuan Chen
|
Jianing Yin
|
Yongkang Huang
|
Yihan Shi
|
Xikun Zhang
|
Libiao Peng
|
Rongsheng Zhang
|
Tangjie Lv
|
Zhipeng Hu
|
Hongning Wang
|
Minlie Huang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Cognitive Restructuring (CR) uses multi-turn dialogue to identify and restructure one’s negative thoughts, arising from mental health issues, into more helpful and positive ones. Clinician shortage and stigma urge the development of human-LLM interactive psychotherapy for CR. Yet, effectively implementing CR is hindered by entrenched cognitive distortions, emotional resistance, and individual differences, which existing works have not overcome. To bridge this gap, we propose CRDial, a novel framework that structures CR as theory-grounded multi-stage multi-turn dialogue, integrating multi-aspect supportive strategies for emotional management and a multi-channel loop mechanism to account for diverse individual distortions. With CRDial, we distill Crisp, a large-scale and high-quality bilingual dialogue dataset, from LLM. We then train Crispers, Crisp-based conversational LLMs for CR, at 7B and 14B scales. Extensive human studies show the superiority of Crispers in pointwise, pairwise, and intervention evaluations.
Search
Fix author
Co-authors
- Yuxuan Chen 2
- Minlie Huang 2
- Hongning Wang 2
- Jianing Yin 2
- Jinfeng Zhou 2
- show all...