Benjamin Luft

Also published as: Benjamin J. Luft

2026

Language-based assessments have demonstrated high convergent validity with corresponding mental and physical health constructs, however often fail to address discriminant validity - the measure’s ability to distinguish the target construct from related ones. This is a common phenomenon within the domain of mental health, as well as comorbidity with physical health conditions. Identifying key features of individual dimensions of mental and physical health present in language can unlock new avenues of research for natural language processing and psychology. We propose two augmentations to the objective function of the Ridge model, deriving closed-form solutions compatible with Singular Value Decomposition-based solvers, to enforce discriminant validity of off-target constructs using Mean Squared Error (MSE) and Squared Cosine Similarity (SCS,) both having widespread use in contrastive learning. By varying the discrimination strength, we find that a decrease in 0.005 Pearson correlation points can result in a Pearson correlation point increase upwards of 0.132 in discriminant validity for mental and physical health constructs derived from self-reported questionnaires. We see similar improvements across multiple fundamental psychopathology dimensions simultaneously, increasing discriminant validity by 0.012 with stronger increases coming from more noisy, less reliable constructs. Our contributions provide a theoretically grounded path towards improving confidence in language-based assessments in the clinical sector, improving specificity of said assessments to various areas of health.

pdf bib abs

While NLP typically treats documents as independent and unordered samples, in longitudinal studies, this assumption rarely holds: documents are nested within authors and ordered in time, forming person-indexed, time-ordered behavioral sequences.Here, we demonstrate the need for and propose a longitudinal modeling and evaluation paradigm that consequently updates four parts of the NLP pipeline: (1) evaluation splits aligned to generalization over people (cross-sectional) and/or time (prospective); (2) accuracy metrics separating between-person differences from within-person dynamics; (3) sequence inputs to incorporate history by default; and (4) model internals that support different coarseness of latent state over histories (pooled summaries, explicit dynamics, or interaction-based models).We demonstrate the issues ensued by traditional pipeline and our proposed improvements on a dataset of 17k daily diary transcripts paired with PTSD symptom severity from 238 participants, finding that traditional document-level evaluation can yield substantially different and sometimes reversed conclusions compared to our ecologically valid modeling and evaluation. We tie our results to a broader discussion motivating a shift from word-sequence evaluation toward behavior-sequence paradigms for NLP.

2025

pdf bib abs

Current speech encoding pipelines often rely on an additional text-based LM to get robust representations of human communication, even though SotA speech-to-text models often have a LM within. This work proposes an approach to improve the LM within an audio model such that the subsequent text-LM is unnecessary. We introduce **WhiSPA** (**Whi**sper with **S**emantic and **P**sychological **A**lignment), which leverages a novel audio training objective: contrastive loss with a language model embedding as a teacher. Using over 500k speech segments from mental health audio interviews, we evaluate the utility of aligning Whisper’s latent space with semantic representations from a text autoencoder (SBERT) and lexically derived embeddings of basic psychological dimensions: emotion and personality. Over self-supervised affective tasks and downstream psychological tasks, WhiSPA surpasses current speech encoders, achieving an average error reduction of 73.4% and 83.8%, respectively. WhiSPA demonstrates that it is not always necessary to run a subsequent text LM on speech-to-text output in order to get a rich psychological representation of human communication.

Co-authors

Vasudha Varadarajan 1

Venues

Fix author