Abstract
Proxy-based methods for annotating mental health status in social media have grown popular in computational research due to their ability to gather large training samples. However, an emerging body of literature has raised new concerns regarding the validity of these types of methods for use in clinical applications. To further understand the robustness of distantly supervised mental health models, we explore the generalization ability of machine learning classifiers trained to detect depression in individuals across multiple social media platforms. Our experiments not only reveal that substantial loss occurs when transferring between platforms, but also that there exist several unreliable confounding factors that may enable researchers to overestimate classification performance. Based on these results, we enumerate recommendations for future mental health dataset construction.- Anthology ID:
- 2020.findings-emnlp.337
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2020
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 3774–3788
- Language:
- URL:
- https://aclanthology.org/2020.findings-emnlp.337
- DOI:
- 10.18653/v1/2020.findings-emnlp.337
- Cite (ACL):
- Keith Harrigian, Carlos Aguirre, and Mark Dredze. 2020. Do Models of Mental Health Based on Social Media Data Generalize?. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3774–3788, Online. Association for Computational Linguistics.
- Cite (Informal):
- Do Models of Mental Health Based on Social Media Data Generalize? (Harrigian et al., Findings 2020)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2020.findings-emnlp.337.pdf
- Code
- kharrigian/emnlp-2020-mental-health-generalization
- Data
- SMHD