Our increasingly digitized lives generate troves of data that reflect our behavior, beliefs, mood, and wellbeing. Such “digital life data” provides crucial insight into the lives of patients outside the healthcare setting that has long been lacking, from a better understanding of mundane patterns of exercise and sleep routines to harbingers of emotional crisis. Moreover, information about individual differences and personalities is encoded in digital life data. In this paper we examine the relationship between mood and movement using linguistic and biometric data, respectively. Does increased physical activity (movement) have an effect on a person’s mood (or vice-versa)? We find that weak group-level relationships between movement and mood mask interesting and often strong relationships between the two for individuals within the group. We describe these individual differences, and argue that individual variability in the relationship between movement and mood is one of many such factors that ought be taken into account in wellbeing-focused apps and AI systems.
Progress on NLP for mental health — indeed, for healthcare in general — is hampered by obstacles to shared, community-level access to relevant data. We report on what is, to our knowledge, the first attempt to address this problem in mental health by conducting a shared task using sensitive data in a secure data enclave. Participating teams received access to Twitter posts donated for research, including data from users with and without suicide attempts, and did all work with the dataset entirely within a secure computational environment. We discuss the task, team results, and lessons learned to set the stage for future tasks on sensitive or confidential data.
Prevailing methods for assessing population-level mental health require costly collection of large samples of data through instruments such as surveys, and are thus slow to reflect current, rapidly changing social conditions. This constrains how easily population-level mental health data can be integrated into health and policy decision-making. Here, we demonstrate that natural language processing applied to publicly-available social media data can provide real-time estimates of psychological distress in the population (specifically, English-speaking Twitter users in the US). We examine population-level changes in linguistic correlates of mental health symptoms in response to the COVID-19 pandemic and to the killing of George Floyd. As a case study, we focus on social media data from healthcare providers, compared to a control sample. Our results provide a concrete demonstration of how the tools of computational social science can be applied to provide real-time or near-real-time insight into the impact of public events on mental health.
In this article, we examine social media data as a lens onto support-seeking among women veterans of the US armed forces. Social media data hold a great deal of promise as a source of information on needs and support-seeking among individuals who are excluded from or systematically prevented from accessing clinical or other institutions ostensibly designed to support them. We apply natural language processing (NLP) techniques to more than 3 million Tweets collected from 20,000 Twitter users. We find evidence that women veterans are more likely to use social media to seek social and community engagement and to discuss mental health and veterans’ issues significantly more frequently than their male counterparts. By contrast, male veterans tend to use social media to amplify political ideologies or to engage in partisan debate. Our results have implications for how organizations can provide outreach and services to this uniquely vulnerable population, and illustrate the utility of non-traditional observational data sources such as social media to understand the needs of marginalized groups.
Depression is a global mental health condition that affects all cultures. Despite this, the way depression is expressed varies by culture. Uptake of machine learning technology for diagnosing mental health conditions means that increasingly more depression classifiers are created from online language data. Yet, culture is rarely considered as a factor affecting online language in this literature. This study explores cultural differences in online language data of users with depression. Written language data from 1,593 users with self-reported depression from the online peer support community 7 Cups of Tea was analyzed using the Linguistic Inquiry and Word Count (LIWC), topic modeling, data visualization, and other techniques. We compared the language of users identifying as White, Black or African American, Hispanic or Latino, and Asian or Pacific Islander. Exploratory analyses revealed cross-cultural differences in depression expression in online language data, particularly in relation to emotion expression, cognition, and functioning. The results have important implications for avoiding depression misclassification from machine-driven assessments when used in a clinical setting, and for avoiding inadvertent cultural biases in this line of research more broadly.
Social media have transformed data-driven research in political science, the social sciences, health, and medicine. Since health research often touches on sensitive topics that relate to ethics of treatment and patient privacy, similar ethical considerations should be acknowledged when using social media data in health research. While much has been said regarding the ethical considerations of social media research, health research leads to an additional set of concerns. We provide practical suggestions in the form of guidelines for researchers working with social media data in health research. These guidelines can inform an IRB proposal for researchers new to social media health research.
In this paper, we provide the first quantified exploration of the structure of the language of dreams, their linguistic style and emotional content. We present a collection of digital dream logs as a viable corpus for the growing study of mental health through the lens of language, complementary to the work done examining more traditional social media. This paper is largely exploratory in nature to lay the groundwork for subsequent research in mental health, rather than optimizing a particular text classification task.
Many psychological phenomena occur in small time windows, measured in minutes or hours. However, most computational linguistic techniques look at data on the order of weeks, months, or years. We explore micropatterns in sequences of messages occurring over a short time window for their prevalence and power for quantifying psychological phenomena, specifically, patterns in affect. We examine affective micropatterns in social media posts from users with anxiety, eating disorders, panic attacks, schizophrenia, suicidality, and matched controls.
Schizophrenia is one of the most disabling and difficult to treat of all human medical/health conditions, ranking in the top ten causes of disability worldwide. It has been a puzzle in part due to difficulty in identifying its basic, fundamental components. Several studies have shown that some manifestations of schizophrenia (e.g., the negative symptoms that include blunting of speech prosody, as well as the disorganization symptoms that lead to disordered language) can be understood from the perspective of linguistics. However, schizophrenia research has not kept pace with technologies in computational linguistics, especially in semantics and pragmatics. As such, we examine the writings of schizophrenia patients analyzing their syntax, semantics and pragmatics. In addition, we analyze tweets of (self proclaimed) schizophrenia patients who publicly discuss their diagnoses. For writing samples dataset, syntactic features are found to be the most successful in classification whereas for the less structured Twitter dataset, a combination of features performed the best.