Yongsen Pan
2025
ECC: An Emotion-Cause Conversation Dataset for Empathy Response
Yuanyuan He
|
Yongsen Pan
|
Wei Li
|
Jiali You
|
Jiawen Deng
|
Fuji Ren
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
The empathy dialogue system requires understanding emotions and their underlying causes. However, existing datasets mainly focus on emotion labels, while cause annotations are added post hoc through costly and subjective manual processes. This leads to three limitations: subjective bias in cause labels, weak rationality due to ambiguous cause-emotion relationships, and high annotation costs that hinder scalability. To address these challenges, we propose ECC (Emotion-Cause Conversation Dataset), a scalable dataset with 2.4K dialogues, which is also the first dialogue dataset where conversations and their emotion-cause labels are automatically generated synergistically during creation. We create an automatic extension framework EC-DD for ECC that utilizes knowledge and large language models (LLMs) to automatically generate conversations, and train a causality-aware empathetic response model CAER on this dataset. Experimental results show that ECC can achieve comparable or even superior performance to artificially constructed empathy dialogue datasets. Our code will be publicly released on https://github.com/Yuan-23/ECC