JT Wolohan


2020

pdf
Estimating the effect of COVID-19 on mental health: Linguistic indicators of depression during a global pandemic
JT Wolohan
Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020

This preliminary analysis uses a deep LSTM neural network with fastText embeddings to predict population rates of depression on Reddit in order to estimate the effect of COVID-19 on mental health. We find that year over year, depression rates on Reddit are up 50% , suggesting a 15-million person increase in the number of depressed Americans and a $7.5 billion increase in depression related spending. This finding suggests that utility in NLP approaches to longitudinal public-health surveillance.

2018

pdf bib
Detecting Linguistic Traces of Depression in Topic-Restricted Text: Attending to Self-Stigmatized Depression with NLP
JT Wolohan | Misato Hiraga | Atreyee Mukherjee | Zeeshan Ali Sayyed | Matthew Millard
Proceedings of the First International Workshop on Language Cognition and Computational Models

Natural language processing researchers have proven the ability of machine learning approaches to detect depression-related cues from language; however, to date, these efforts have primarily assumed it was acceptable to leave depression-related texts in the data. Our concerns with this are twofold: first, that the models may be overfitting on depression-related signals, which may not be present in all depressed users (only those who talk about depression on social media); and second, that these models would under-perform for users who are sensitive to the public stigma of depression. This study demonstrates the validity to those concerns. We construct a novel corpus of texts from 12,106 Reddit users and perform lexical and predictive analyses under two conditions: one where all text produced by the users is included and one where the depression data is withheld. We find significant differences in the language used by depressed users under the two conditions as well as a difference in the ability of machine learning algorithms to correctly detect depression. However, despite the lexical differences and reduced classification performance–each of which suggests that users may be able to fool algorithms by avoiding direct discussion of depression–a still respectable overall performance suggests lexical models are reasonably robust and well suited for a role in a diagnostic or monitoring capacity.