Abstract
In this paper, we demonstrate how the state-of-the-art machine learning and text mining techniques can be used to build effective social media-based substance use detection systems. Since a substance use ground truth is difficult to obtain on a large scale, to maximize system performance, we explore different unsupervised feature learning methods to take advantage of a large amount of unsupervised social media data. We also demonstrate the benefit of using multi-view unsupervised feature learning to combine heterogeneous user information such as Facebook “likes” and “status updates” to enhance system performance. Based on our evaluation, our best models achieved 86% AUC for predicting tobacco use, 81% for alcohol use and 84% for illicit drug use, all of which significantly outperformed existing methods. Our investigation has also uncovered interesting relations between a user’s social media behavior (e.g., word usage) and substance use.- Anthology ID:
- D17-1241
- Volume:
- Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
- Month:
- September
- Year:
- 2017
- Address:
- Copenhagen, Denmark
- Editors:
- Martha Palmer, Rebecca Hwa, Sebastian Riedel
- Venue:
- EMNLP
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2275–2284
- Language:
- URL:
- https://aclanthology.org/D17-1241
- DOI:
- 10.18653/v1/D17-1241
- Cite (ACL):
- Tao Ding, Warren K. Bickel, and Shimei Pan. 2017. Multi-View Unsupervised User Feature Embedding for Social Media-based Substance Use Prediction. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2275–2284, Copenhagen, Denmark. Association for Computational Linguistics.
- Cite (Informal):
- Multi-View Unsupervised User Feature Embedding for Social Media-based Substance Use Prediction (Ding et al., EMNLP 2017)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-3/D17-1241.pdf