Mina Kian
2025
Using Linguistic Entrainment to Evaluate Large Language Models for Use in Cognitive Behavioral Therapy
Mina Kian
|
Kaleen Shrestha
|
Katrin Fischer
|
Xiaoyuan Zhu
|
Jonathan Ong
|
Aryan Trehan
|
Jessica Wang
|
Gloria Chang
|
Séb Arnold
|
Maja Mataric
Findings of the Association for Computational Linguistics: NAACL 2025
Entrainment, the responsive communication between interacting individuals, is a crucial process in building a strong relationship between a mental health therapist and their client, leading to positive therapeutic outcomes. However, so far entrainment has not been investigated as a measure of efficacy of large language models (LLMs) delivering mental health therapy. In this work, we evaluate the linguistic entrainment of an LLM (ChatGPT 3.5-turbo) in a mental health dialog setting. We first validate computational measures of linguistic entrainment with two measures of the quality of client self-disclosures: intimacy and engagement (p < 0.05). We then compare the linguistic entrainment of the LLM to trained therapists and non-expert online peer supporters in a cognitive behavioral therapy (CBT) setting. We show that the LLM is outperformed by humans with respect to linguistic entrainment (p < 0.001). These results support the need to be cautious in using LLMs out-of-the-box for mental health applications.
Mechanistic Interpretability of Emotion Inference in Large Language Models
Ala N. Tak
|
Amin Banayeeanzade
|
Anahita Bolourani
|
Mina Kian
|
Robin Jia
|
Jonathan Gratch
Findings of the Association for Computational Linguistics: ACL 2025
Large language models (LLMs) show promising capabilities in predicting human emotions from text. However, the mechanisms through which these models process emotional stimuli remain largely unexplored. Our study addresses this gap by investigating how autoregressive LLMs infer emotions, showing that emotion representations are functionally localized to specific regions in the model. Our evaluation includes diverse model families and sizes, and is supported by robustness checks. We then show that the identified representations are psychologically plausible by drawing on cognitive appraisal theory—a well-established psychological framework positing that emotions emerge from evaluations (appraisals) of environmental stimuli. By causally intervening on construed appraisal concepts, we steer the generation and show that the outputs align with theoretical and intuitive expectations. This work highlights a novel way to causally intervene and control emotion inference, potentially benefiting safety and alignment in sensitive affective domains.
Search
Fix author
Co-authors
- Séb Arnold 1
- Amin Banayeeanzade 1
- Anahita Bolourani 1
- Gloria Chang 1
- Katrin Fischer 1
- show all...