Mostly Grounded, Occasionally Risky: Expert Evaluation of LLM-Generated Supervisory Feedback in a Psychotherapy Training Simulator

Adrian Montesano; Justin Bloomberg; Marc Pérez-Buriel

Mostly Grounded, Occasionally Risky: Expert Evaluation of LLM-Generated Supervisory Feedback in a Psychotherapy Training Simulator

Adrian Montesano, Justin Bloomberg, Marc Pérez-Buriel

Abstract

Automated feedback is increasingly cited as a key advantage of AI-based psychotherapy training, yet the clinical groundedness of LLM-generated supervisory feedback remains unevaluated. We present an expert evaluation of supervisory feedback generated by PRACTICE, an LLM-powered open-ended psychotherapy training simulator, across 21 feedback instances from four novice trainees. Two clinical psychology experts independently coded 167 feedback propositions as Justified, Unjustified, or Unsure. Inter-rater reliability was near-perfect (raw agreement = 98.2\%; $\kappa$ = 0.902). Of the 167 propositions, 149 (89.2\%) were rated Justified; however, 52.4\% of feedback instances contained at least one non-justified proposition, and qualitative analysis identified three recurring failure types: incorrect characterization, referential imprecision, and unclear communication. In clinical training contexts, even low error rates carry ethical weight: unjustified feedback risks reinforcing inappropriate clinical behaviors in trainees that can be trasnferred to real practice. These findings provide an initial empirical basis for the responsible deployment of LLM-generated feedback in clinical training and call for traceable, expert-auditable feedback architectures.

Anthology ID:: 2026.clpsych-1.24
Volume:: Proceedings of the 10th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2026)
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Aya Zirikly, Kfir Bar, Sean MacAvaney, Molly Ireland, Yaakov Ophir, Dana Atzil-Slonim, Vasudha Varadarajan, Steven Bedrick, Bart Desmet
Venues:: CLPsych | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 298–305
Language:
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.clpsych-1.24/
DOI:
Bibkey:
Cite (ACL):: Adrian Montesano, Justin Bloomberg, and Marc Pérez-Buriel. 2026. Mostly Grounded, Occasionally Risky: Expert Evaluation of LLM-Generated Supervisory Feedback in a Psychotherapy Training Simulator. In Proceedings of the 10th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2026), pages 298–305, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: Mostly Grounded, Occasionally Risky: Expert Evaluation of LLM-Generated Supervisory Feedback in a Psychotherapy Training Simulator (Montesano et al., CLPsych 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.clpsych-1.24.pdf

PDF Cite Search Fix data