Synthetic Empathy: Generating and Evaluating Artificial Psychotherapy Dialogues to Detect Empathy in Counseling Sessions

Daniel Cabrera Lozoya, Eloy Hernandez Lua, Juan Alberto Barajas Perches, Mike Conway, Simon D’Alfonso


Abstract
Natural language processing (NLP) holds potential for analyzing psychotherapy transcripts. Nonetheless, gathering the necessary data to train NLP models for clinical tasks is a challenging process due to patient confidentiality regulations that restrict data sharing. To overcome this obstacle, we propose leveraging large language models (LLMs) to create synthetic psychotherapy dialogues that can be used to train NLP models for downstream clinical tasks. To evaluate the quality of our synthetic data, we trained three multi-task RoBERTa-based bi-encoder models, originally developed by Sharma et al., to detect empathy in dialogues. These models, initially trained on Reddit data, were developed alongside EPITOME, a framework designed to characterize empathetic communication in conversations. We collected and annotated 579 therapeutic interactions between therapists and patients using the EPITOME framework. Additionally, we generated 10,464 synthetic therapeutic dialogues using various LLMs and prompting techniques, all of which were annotated following the EPITOME framework. We conducted two experiments: one where we augmented the original dataset with synthetic data and another where we replaced the Reddit dataset with synthetic data. Our first experiment showed that incorporating synthetic data can improve the F1 score of empathy detection by up to 10%. The second experiment revealed no substantial differences between organic and synthetic data, as their performance remained on par when substituted.
Anthology ID:
2025.clpsych-1.13
Volume:
Proceedings of the 10th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2025)
Month:
May
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Ayah Zirikly, Andrew Yates, Bart Desmet, Molly Ireland, Steven Bedrick, Sean MacAvaney, Kfir Bar, Yaakov Ophir
Venues:
CLPsych | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
157–171
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.clpsych-1.13/
DOI:
Bibkey:
Cite (ACL):
Daniel Cabrera Lozoya, Eloy Hernandez Lua, Juan Alberto Barajas Perches, Mike Conway, and Simon D’Alfonso. 2025. Synthetic Empathy: Generating and Evaluating Artificial Psychotherapy Dialogues to Detect Empathy in Counseling Sessions. In Proceedings of the 10th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2025), pages 157–171, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
Synthetic Empathy: Generating and Evaluating Artificial Psychotherapy Dialogues to Detect Empathy in Counseling Sessions (Cabrera Lozoya et al., CLPsych 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.clpsych-1.13.pdf