Johannes Eichstaedt
2026
Language-Based Detection of Adherence to Evidence-Based Psychotherapy Scripts
Samuel Campione | Elizabeth Stade | Stefanie Losavio | Shreya Singhvi | William Xuan | Tony Bui | Maria Martin Lopez | Shashanka Subrahmanya | Bailee Schuhmann | Courtney Worley | Shannon Wiltsey Stirman | Johannes Eichstaedt | H. Andrew Schwartz
Proceedings of the 10th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2026)
Samuel Campione | Elizabeth Stade | Stefanie Losavio | Shreya Singhvi | William Xuan | Tony Bui | Maria Martin Lopez | Shashanka Subrahmanya | Bailee Schuhmann | Courtney Worley | Shannon Wiltsey Stirman | Johannes Eichstaedt | H. Andrew Schwartz
Proceedings of the 10th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2026)
Some psychotherapies, such as written exposure therapy for posttraumatic stress disorder, utilize "scripts" during parts of treatment, but verifying script adherence to ensure engagement of key mechanisms of change is a time-consuming step for therapy supervisors. Here, we formalize therapy script adherence as an NLP task, and evaluate several simple (text similarity) and more complex (few-shot LLM) approaches. Over 351 annotated therapist utterance-script pairs, we find text similarity approaches to be highly competitive with LLMs and produce fewer false positives. ROUGE-L recall achieves F1 = 0.973, and BLEU achieves F1 = 0.972 with full precision and zero false positives. GPT-5.2 achieves F1 = 0.935 and GPT-4o-mini achieves F1 = 0.876. Given that the text similarity techniques are multiple orders of magnitude less complex, our results underscore the ability for simpler NLP techniques to still be effective in the age of LLMs for tasks that are more textual in nature, suggesting that aspects of therapist fidelity to evidence-based treatments can be assessed without using cloud API calls.
2024
Using Daily Language to Understand Drinking: Multi-Level Longitudinal Differential Language Analysis
Matthew Matero | Huy Vu | August Nilsson | Syeda Mahwish | Young Min Cho | James McKay | Johannes Eichstaedt | Richard Rosenthal | Lyle Ungar | H. Andrew Schwartz
Proceedings of the 9th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2024)
Matthew Matero | Huy Vu | August Nilsson | Syeda Mahwish | Young Min Cho | James McKay | Johannes Eichstaedt | Richard Rosenthal | Lyle Ungar | H. Andrew Schwartz
Proceedings of the 9th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2024)
Analyses for linking language with psychological factors or behaviors predominately treat linguistic features as a static set, working with a single document per person or aggregating across multiple posts (e.g. on social media) into a single set of features. This limits language to mostly shed light on between-person differences rather than changes in behavior within-person. Here, we collected a novel dataset of daily surveys where participants were asked to describe their experienced well-being and report the number of alcoholic beverages they had within the past 24 hours. Through this data, we first build a multi-level forecasting model that is able to capture within-person change and leverage both the psychological features of the person and daily well-being responses. Then, we propose a longitudinal version of differential language analysis that finds patterns associated with drinking more (e.g. social events) and less (e.g. task-oriented), as well as distinguishing patterns of heavy drinks versus light drinkers.
2023
Discourse-Level Representations can Improve Prediction of Degree of Anxiety
Swanie Juhng | Matthew Matero | Vasudha Varadarajan | Johannes Eichstaedt | Adithya V Ganesan | H. Andrew Schwartz
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Swanie Juhng | Matthew Matero | Vasudha Varadarajan | Johannes Eichstaedt | Adithya V Ganesan | H. Andrew Schwartz
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Anxiety disorders are the most common of mental illnesses, but relatively little is known about how to detect them from language. The primary clinical manifestation of anxiety is worry associated cognitive distortions, which are likely expressed at the discourse-level of semantics. Here, we investigate the development of a modern linguistic assessment for degree of anxiety, specifically evaluating the utility of discourse-level information in addition to lexical-level large language model embeddings. We find that a combined lexico-discourse model outperforms models based solely on state-of-the-art contextual embeddings (RoBERTa), with discourse-level representations derived from Sentence-BERT and DiscRE both providing additional predictive power not captured by lexical-level representations. Interpreting the model, we find that discourse patterns of causal explanations, among others, were used significantly more by those scoring high in anxiety, dovetailing with psychological literature.
2022
WWBP-SQT-lite: Multi-level Models and Difference Embeddings for Moments of Change Identification in Mental Health Forums
Adithya V Ganesan | Vasudha Varadarajan | Juhi Mittal | Shashanka Subrahmanya | Matthew Matero | Nikita Soni | Sharath Chandra Guntuku | Johannes Eichstaedt | H. Andrew Schwartz
Proceedings of the Eighth Workshop on Computational Linguistics and Clinical Psychology
Adithya V Ganesan | Vasudha Varadarajan | Juhi Mittal | Shashanka Subrahmanya | Matthew Matero | Nikita Soni | Sharath Chandra Guntuku | Johannes Eichstaedt | H. Andrew Schwartz
Proceedings of the Eighth Workshop on Computational Linguistics and Clinical Psychology
Psychological states unfold dynamically; to understand and measure mental health at scale we need to detect and measure these changes from sequences of online posts. We evaluate two approaches to capturing psychological changes in text: the first relies on computing the difference between the embedding of a message with the one that precedes it, the second relies on a “human-aware” multi-level recurrent transformer (HaRT). The mood changes of timeline posts of users were annotated into three classes, ‘ordinary,’ ‘switching’ (positive to negative or vice versa) and ‘escalations’ (increasing in intensity). For classifying these mood changes, the difference-between-embeddings technique – applied to RoBERTa embeddings – showed the highest overall F1 score (0.61) across the three different classes on the test set. The technique particularly outperformed the HaRT transformer (and other baselines) in the detection of switches (F1 = .33) and escalations (F1 = .61).Consistent with the literature, the language use patterns associated with mental-health related constructs in prior work (including depression, stress, anger and anxiety) predicted both mood switches and escalations.
2020
Understanding Weekly COVID-19 Concerns through Dynamic Content-Specific LDA Topic Modeling
Mohammadzaman Zamani | H. Andrew Schwartz | Johannes Eichstaedt | Sharath Chandra Guntuku | Adithya Virinchipuram Ganesan | Sean Clouston | Salvatore Giorgi
Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science
Mohammadzaman Zamani | H. Andrew Schwartz | Johannes Eichstaedt | Sharath Chandra Guntuku | Adithya Virinchipuram Ganesan | Sean Clouston | Salvatore Giorgi
Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science
The novelty and global scale of the COVID-19 pandemic has lead to rapid societal changes in a short span of time. As government policy and health measures shift, public perceptions and concerns also change, an evolution documented within discourse on social media. We propose a dynamic content-specific LDA topic modeling technique that can help to identify different domains of COVID-specific discourse that can be used to track societal shifts in concerns or views. Our experiments show that these model-derived topics are more coherent than standard LDA topics, and also provide new features that are more helpful in prediction of COVID-19 related outcomes including social mobility and unemployment rate.
Detecting Emerging Symptoms of COVID-19 using Context-based Twitter Embeddings
Roshan Santosh | H. Andrew Schwartz | Johannes Eichstaedt | Lyle Ungar | Sharath Chandra Guntuku
Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020
Roshan Santosh | H. Andrew Schwartz | Johannes Eichstaedt | Lyle Ungar | Sharath Chandra Guntuku
Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020
In this paper, we present an iterative graph-based approach for the detection of symptoms of COVID-19, the pathology of which seems to be evolving. More generally, the method can be applied to finding context-specific words and texts (e.g. symptom mentions) in large imbalanced corpora (e.g. all tweets mentioning #COVID-19). Given the novelty of COVID-19, we also test if the proposed approach generalizes to the problem of detecting Adverse Drug Reaction (ADR). We find that the approach applied to Twitter data can detect symptom mentions substantially before to their being reported by the Centers for Disease Control (CDC).
Explaining the Trump Gap in Social Distancing Using COVID Discourse
Austin Van Loon | Sheridan Stewart | Brandon Waldon | Shrinidhi K Lakshmikanth | Ishan Shah | Sharath Chandra Guntuku | Garrick Sherman | James Zou | Johannes Eichstaedt
Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020
Austin Van Loon | Sheridan Stewart | Brandon Waldon | Shrinidhi K Lakshmikanth | Ishan Shah | Sharath Chandra Guntuku | Garrick Sherman | James Zou | Johannes Eichstaedt
Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020
Our ability to limit the future spread of COVID-19 will in part depend on our understanding of the psychological and sociological processes that lead people to follow or reject coronavirus health behaviors. We argue that the virus has taken on heterogeneous meanings in communities across the United States and that these disparate meanings shaped communities’ response to the virus during the early, vital stages of the outbreak in the U.S. Using word embeddings, we demonstrate that counties where residents socially distanced less on average (as measured by residential mobility) more semantically associated the virus in their COVID discourse with concepts of fraud, the political left, and more benign illnesses like the flu. We also show that the different meanings the virus took on in different communities explains a substantial fraction of what we call the “”Trump Gap”, or the empirical tendency for more Trump-supporting counties to socially distance less. This work demonstrates that community-level processes of meaning-making in part determined behavioral responses to the COVID-19 pandemic and that these processes can be measured unobtrusively using Twitter.
2017
DLATK: Differential Language Analysis ToolKit
H. Andrew Schwartz | Salvatore Giorgi | Maarten Sap | Patrick Crutchley | Lyle Ungar | Johannes Eichstaedt
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
H. Andrew Schwartz | Salvatore Giorgi | Maarten Sap | Patrick Crutchley | Lyle Ungar | Johannes Eichstaedt
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
We present Differential Language Analysis Toolkit (DLATK), an open-source python package and command-line tool developed for conducting social-scientific language analyses. While DLATK provides standard NLP pipeline steps such as tokenization or SVM-classification, its novel strengths lie in analyses useful for psychological, health, and social science: (1) incorporation of extra-linguistic structured information, (2) specified levels and units of analysis (e.g. document, user, community), (3) statistical metrics for continuous outcomes, and (4) robust, proven, and accurate pipelines for social-scientific prediction problems. DLATK integrates multiple popular packages (SKLearn, Mallet), enables interactive usage (Jupyter Notebooks), and generally follows object oriented principles to make it easy to tie in additional libraries or storage technologies.
2016
Modelling Valence and Arousal in Facebook posts
Daniel Preoţiuc-Pietro | H. Andrew Schwartz | Gregory Park | Johannes Eichstaedt | Margaret Kern | Lyle Ungar | Elisabeth Shulman
Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis
Daniel Preoţiuc-Pietro | H. Andrew Schwartz | Gregory Park | Johannes Eichstaedt | Margaret Kern | Lyle Ungar | Elisabeth Shulman
Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis
Does ‘well-being’ translate on Twitter?
Laura Smith | Salvatore Giorgi | Rishi Solanki | Johannes Eichstaedt | H. Andrew Schwartz | Muhammad Abdul-Mageed | Anneke Buffone | Lyle Ungar
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
Laura Smith | Salvatore Giorgi | Rishi Solanki | Johannes Eichstaedt | H. Andrew Schwartz | Muhammad Abdul-Mageed | Anneke Buffone | Lyle Ungar
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
2015
The role of personality, age, and gender in tweeting about mental illness
Daniel Preoţiuc-Pietro | Johannes Eichstaedt | Gregory Park | Maarten Sap | Laura Smith | Victoria Tobolsky | H. Andrew Schwartz | Lyle Ungar
Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality
Daniel Preoţiuc-Pietro | Johannes Eichstaedt | Gregory Park | Maarten Sap | Laura Smith | Victoria Tobolsky | H. Andrew Schwartz | Lyle Ungar
Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality
Extracting Human Temporal Orientation from Facebook Language
H. Andrew Schwartz | Gregory Park | Maarten Sap | Evan Weingarten | Johannes Eichstaedt | Margaret Kern | David Stillwell | Michal Kosinski | Jonah Berger | Martin Seligman | Lyle Ungar
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
H. Andrew Schwartz | Gregory Park | Maarten Sap | Evan Weingarten | Johannes Eichstaedt | Margaret Kern | David Stillwell | Michal Kosinski | Jonah Berger | Martin Seligman | Lyle Ungar
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
2014
Towards Assessing Changes in Degree of Depression through Facebook
H. Andrew Schwartz | Johannes Eichstaedt | Margaret L. Kern | Gregory Park | Maarten Sap | David Stillwell | Michal Kosinski | Lyle Ungar
Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality
H. Andrew Schwartz | Johannes Eichstaedt | Margaret L. Kern | Gregory Park | Maarten Sap | David Stillwell | Michal Kosinski | Lyle Ungar
Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality
Developing Age and Gender Predictive Lexica over Social Media
Maarten Sap | Gregory Park | Johannes Eichstaedt | Margaret Kern | David Stillwell | Michal Kosinski | Lyle Ungar | Hansen Andrew Schwartz
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Maarten Sap | Gregory Park | Johannes Eichstaedt | Margaret Kern | David Stillwell | Michal Kosinski | Lyle Ungar | Hansen Andrew Schwartz
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
2013
Choosing the Right Words: Characterizing and Reducing Error of the Word Count Approach
Hansen Andrew Schwartz | Johannes Eichstaedt | Eduardo Blanco | Lukasz Dziurzynski | Margaret L. Kern | Stephanie Ramones | Martin Seligman | Lyle Ungar
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity
Hansen Andrew Schwartz | Johannes Eichstaedt | Eduardo Blanco | Lukasz Dziurzynski | Margaret L. Kern | Stephanie Ramones | Martin Seligman | Lyle Ungar
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity
Search
Fix author
Co-authors
- H. Andrew Schwartz 14
- Lyle Ungar 10
- Margaret Kern 5
- Gregory Park 5
- Maarten Sap 5
- Sharath Chandra Guntuku 4
- Salvatore Giorgi 3
- Michal Kosinski 3
- Matthew Matero 3
- David Stillwell 3
- Daniel Preoţiuc-Pietro 2
- Martin Seligman 2
- Laura Smith 2
- Shashanka Subrahmanya 2
- Adithya V. Ganesan 2
- Vasudha Varadarajan 2
- Muhammad Abdul-Mageed 1
- Jonah Berger 1
- Eduardo Blanco 1
- Anneke Buffone 1
- Tony Bui 1
- Samuel Campione 1
- Young Min Cho 1
- Sean Clouston 1
- Patrick Crutchley 1
- Lukasz Dziurzynski 1
- Swanie Juhng 1
- Shrinidhi K Lakshmikanth 1
- Austin Van Loon 1
- Stefanie Losavio 1
- Syeda Mahwish 1
- Maria Martin Lopez 1
- James McKay 1
- Juhi Mittal 1
- August Håkan Nilsson 1
- Stephanie Ramones 1
- Richard Rosenthal 1
- Roshan Santhosh 1
- Bailee Schuhmann 1
- Ishan Shah 1
- Garrick Sherman 1
- Elisabeth Shulman 1
- Shreya Singhvi 1
- Rishi Solanki 1
- Nikita Soni 1
- Elizabeth Stade 1
- Sheridan Stewart 1
- Victoria Tobolsky 1
- Adithya Virinchipuram Ganesan 1
- Huy Vu 1
- Brandon Waldon 1
- Evan Weingarten 1
- Shannon Wiltsey Stirman 1
- Courtney Worley 1
- William Xuan 1
- Mohammadzaman Zamani 1
- James Zou 1