SMM4H Shared Task 2020 - A Hybrid Pipeline for Identifying Prescription Drug Abuse from Twitter: Machine Learning, Deep Learning, and Post-Processing
Isabel Metzger, Emir Y. Haskovic, Allison Black, Whitley M. Yi, Rajat S. Chandra, Mark T. Rutledge, William McMahon, Yindalon Aphinyanaphongs
Abstract
This paper presents our approach to multi-class text categorization of tweets mentioning prescription medications as being indicative of potential abuse/misuse (A), consumption/non-abuse (C), mention-only (M), or an unrelated reference (U) using natural language processing techniques. Data augmentation increased our training and validation corpora from 13,172 tweets to 28,094 tweets. We also created word-embeddings on domain-specific social media and medical corpora. Our hybrid pipeline of an attention-based CNN with post-processing was the best performing system in task 4 of SMM4H 2020, with an F1 score of 0.51 for class A.- Anthology ID:
- 2020.smm4h-1.9
- Volume:
- Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task
- Month:
- December
- Year:
- 2020
- Address:
- Barcelona, Spain (Online)
- Editors:
- Graciela Gonzalez-Hernandez, Ari Z. Klein, Ivan Flores, Davy Weissenbacher, Arjun Magge, Karen O'Connor, Abeed Sarker, Anne-Lyse Minard, Elena Tutubalina, Zulfat Miftahutdinov, Ilseyar Alimova
- Venue:
- SMM4H
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 57–62
- Language:
- URL:
- https://aclanthology.org/2020.smm4h-1.9
- DOI:
- Cite (ACL):
- Isabel Metzger, Emir Y. Haskovic, Allison Black, Whitley M. Yi, Rajat S. Chandra, Mark T. Rutledge, William McMahon, and Yindalon Aphinyanaphongs. 2020. SMM4H Shared Task 2020 - A Hybrid Pipeline for Identifying Prescription Drug Abuse from Twitter: Machine Learning, Deep Learning, and Post-Processing. In Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task, pages 57–62, Barcelona, Spain (Online). Association for Computational Linguistics.
- Cite (Informal):
- SMM4H Shared Task 2020 - A Hybrid Pipeline for Identifying Prescription Drug Abuse from Twitter: Machine Learning, Deep Learning, and Post-Processing (Metzger et al., SMM4H 2020)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2020.smm4h-1.9.pdf