Mark T. Rutledge
2020
SMM4H Shared Task 2020 - A Hybrid Pipeline for Identifying Prescription Drug Abuse from Twitter: Machine Learning, Deep Learning, and Post-Processing
Isabel Metzger
|
Emir Y. Haskovic
|
Allison Black
|
Whitley M. Yi
|
Rajat S. Chandra
|
Mark T. Rutledge
|
William McMahon
|
Yindalon Aphinyanaphongs
Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task
This paper presents our approach to multi-class text categorization of tweets mentioning prescription medications as being indicative of potential abuse/misuse (A), consumption/non-abuse (C), mention-only (M), or an unrelated reference (U) using natural language processing techniques. Data augmentation increased our training and validation corpora from 13,172 tweets to 28,094 tweets. We also created word-embeddings on domain-specific social media and medical corpora. Our hybrid pipeline of an attention-based CNN with post-processing was the best performing system in task 4 of SMM4H 2020, with an F1 score of 0.51 for class A.
Search
Co-authors
- Isabel Metzger 1
- Emir Y. Haskovic 1
- Allison Black 1
- Whitley M. Yi 1
- Rajat S. Chandra 1
- show all...