Elizabeth Stade


2026

Some psychotherapies, such as written exposure therapy for posttraumatic stress disorder, utilize "scripts" during parts of treatment, but verifying script adherence to ensure engagement of key mechanisms of change is a time-consuming step for therapy supervisors. Here, we formalize therapy script adherence as an NLP task, and evaluate several simple (text similarity) and more complex (few-shot LLM) approaches. Over 351 annotated therapist utterance-script pairs, we find text similarity approaches to be highly competitive with LLMs and produce fewer false positives. ROUGE-L recall achieves F1 = 0.973, and BLEU achieves F1 = 0.972 with full precision and zero false positives. GPT-5.2 achieves F1 = 0.935 and GPT-4o-mini achieves F1 = 0.876. Given that the text similarity techniques are multiple orders of magnitude less complex, our results underscore the ability for simpler NLP techniques to still be effective in the age of LLMs for tasks that are more textual in nature, suggesting that aspects of therapist fidelity to evidence-based treatments can be assessed without using cloud API calls.
AI systems for mental health are developed predominantly using data from Western, Educated, Industrialized, Rich, and Democratic (WEIRD) populations, raising concerns about their validity, fairness, and generalizability across geo-cultural contexts. This limitation is especially consequential in mental health, where linguistic expression, symptom presentation, help-seeking behavior, and access to care vary substantially across populations. We argue that culturally responsive AI mental health systems require explicit attention to culture throughout the development lifecycle, from data collection to training and deployment. We present a sociotechnical framework for developing culturally responsive AI mental health applications to provide AI researchers and practitioners with an actionable roadmap for building more equitable, reliable, and contextually appropriate mental health technologies.

2025

Computational mental health research develops models to predict and understand psychological phenomena, but often relies on inappropriate measures of psychopathology constructs, undermining validity. We identify three key issues: (1) reliance on unvalidated measures (e.g., self-declared diagnosis) over validated ones (e.g., diagnosis by clinician); (2) treating mental health constructs as categorical rather than dimensional; and (3) focusing on disorder-specific constructs instead of transdiagnostic ones. We outline the benefits of using validated, dimensional, and transdiagnostic measures and offer practical recommendations for practitioners. Using valid measures that reflect the nature and structure of psychopathology is essential for computational mental health research.