Detecting Linguistic Traces of Depression in Topic-Restricted Text: Attending to Self-Stigmatized Depression with NLP

JT Wolohan, Misato Hiraga, Atreyee Mukherjee, Zeeshan Ali Sayyed, Matthew Millard


Abstract
Natural language processing researchers have proven the ability of machine learning approaches to detect depression-related cues from language; however, to date, these efforts have primarily assumed it was acceptable to leave depression-related texts in the data. Our concerns with this are twofold: first, that the models may be overfitting on depression-related signals, which may not be present in all depressed users (only those who talk about depression on social media); and second, that these models would under-perform for users who are sensitive to the public stigma of depression. This study demonstrates the validity to those concerns. We construct a novel corpus of texts from 12,106 Reddit users and perform lexical and predictive analyses under two conditions: one where all text produced by the users is included and one where the depression data is withheld. We find significant differences in the language used by depressed users under the two conditions as well as a difference in the ability of machine learning algorithms to correctly detect depression. However, despite the lexical differences and reduced classification performance–each of which suggests that users may be able to fool algorithms by avoiding direct discussion of depression–a still respectable overall performance suggests lexical models are reasonably robust and well suited for a role in a diagnostic or monitoring capacity.
Anthology ID:
W18-4102
Volume:
Proceedings of the First International Workshop on Language Cognition and Computational Models
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Venue:
LCCM
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11–21
Language:
URL:
https://aclanthology.org/W18-4102
DOI:
Bibkey:
Cite (ACL):
JT Wolohan, Misato Hiraga, Atreyee Mukherjee, Zeeshan Ali Sayyed, and Matthew Millard. 2018. Detecting Linguistic Traces of Depression in Topic-Restricted Text: Attending to Self-Stigmatized Depression with NLP. In Proceedings of the First International Workshop on Language Cognition and Computational Models, pages 11–21, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
Detecting Linguistic Traces of Depression in Topic-Restricted Text: Attending to Self-Stigmatized Depression with NLP (Wolohan et al., LCCM 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/auto-file-uploads/W18-4102.pdf