David Brent
2021
Learning Language and Multimodal Privacy-Preserving Markers of Mood from Mobile Data
Paul Pu Liang
|
Terrance Liu
|
Anna Cai
|
Michal Muszynski
|
Ryo Ishii
|
Nick Allen
|
Randy Auerbach
|
David Brent
|
Ruslan Salakhutdinov
|
Louis-Philippe Morency
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Mental health conditions remain underdiagnosed even in countries with common access to advanced medical care. The ability to accurately and efficiently predict mood from easily collectible data has several important implications for the early detection, intervention, and treatment of mental health disorders. One promising data source to help monitor human behavior is daily smartphone usage. However, care must be taken to summarize behaviors without identifying the user through personal (e.g., personally identifiable information) or protected (e.g., race, gender) attributes. In this paper, we study behavioral markers of daily mood using a recent dataset of mobile behaviors from adolescent populations at high risk of suicidal behaviors. Using computational models, we find that language and multimodal representations of mobile typed text (spanning typed characters, words, keystroke timings, and app usage) are predictive of daily mood. However, we find that models trained to predict mood often also capture private user identities in their intermediate representations. To tackle this problem, we evaluate approaches that obfuscate user identity while remaining predictive. By combining multimodal representations with privacy-preserving learning, we are able to push forward the performance-privacy frontier.
2019
CLPsych2019 Shared Task: Predicting Suicide Risk Level from Reddit Posts on Multiple Forums
Victor Ruiz
|
Lingyun Shi
|
Wei Quan
|
Neal Ryan
|
Candice Biernesser
|
David Brent
|
Rich Tsui
Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology
We aimed to predict an individual suicide risk level from longitudinal posts on Reddit discussion forums. Through participating in a shared task competition hosted by CLPsych2019, we received two annotated datasets: a training dataset with 496 users (31,553 posts) and a test dataset with 125 users (9610 posts). We submitted results from our three best-performing machine-learning models: SVM, Naïve Bayes, and an ensemble model. Each model provided a user’s suicide risk level in four categories, i.e., no risk, low risk, moderate risk, and severe risk. Among the three models, the ensemble model had the best macro-averaged F1 score 0.379 when tested on the holdout test dataset. The NB model had the best performance in two additional binary-classification tasks, i.e., no risk vs. flagged risk (any risk level other than no risk) with F1 score 0.836 and no or low risk vs. urgent risk (moderate or severe risk) with F1 score 0.736. We conclude that the NB model may serve as a tool for identifying users with flagged or urgent suicide risk based on longitudinal posts on Reddit discussion forums.
Search
Co-authors
- Paul Pu Liang 1
- Terrance Liu 1
- Anna Cai 1
- Michal Muszynski 1
- Ryo Ishii 1
- show all...