Abstract
Models for identifying depression using social media text exhibit biases towards different gender and racial/ethnic groups. Factors like representation and balance of groups within the dataset are contributory factors, but difference in content and social media use may further explain these biases. We present an analysis of the content of social media posts from different demographic groups. Our analysis shows that there are content differences between depression and control subgroups across demographic groups, and that temporal topics and demographic-specific topics are correlated with downstream depression model error. We discuss the implications of our work on creating future datasets, as well as designing and training models for mental health.- Anthology ID:
- 2021.clpsych-1.19
- Volume:
- Proceedings of the Seventh Workshop on Computational Linguistics and Clinical Psychology: Improving Access
- Month:
- June
- Year:
- 2021
- Address:
- Online
- Venue:
- CLPsych
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 169–180
- Language:
- URL:
- https://aclanthology.org/2021.clpsych-1.19
- DOI:
- 10.18653/v1/2021.clpsych-1.19
- Cite (ACL):
- Carlos Aguirre and Mark Dredze. 2021. Qualitative Analysis of Depression Models by Demographics. In Proceedings of the Seventh Workshop on Computational Linguistics and Clinical Psychology: Improving Access, pages 169–180, Online. Association for Computational Linguistics.
- Cite (Informal):
- Qualitative Analysis of Depression Models by Demographics (Aguirre & Dredze, CLPsych 2021)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2021.clpsych-1.19.pdf