Ensemble BERT for Classifying Medication-mentioning Tweets

Huong Dang, Kahyun Lee, Sam Henry, Özlem Uzuner


Abstract
Twitter is a valuable source of patient-generated data that has been used in various population health studies. The first step in many of these studies is to identify and capture Twitter messages (tweets) containing medication mentions. In this article, we describe our submission to Task 1 of the Social Media Mining for Health Applications (SMM4H) Shared Task 2020. This task challenged participants to detect tweets that mention medications or dietary supplements in a natural, highly imbalance dataset. Our system combined a handcrafted preprocessing step with an ensemble of 20 BERT-based classifiers generated by dividing the training dataset into subsets using 10-fold cross validation and exploiting two BERT embedding models. Our system ranked first in this task, and improved the average F1 score across all participating teams by 19.07% with a precision, recall, and F1 on the test set of 83.75%, 87.01%, and 85.35% respectively.
Anthology ID:
2020.smm4h-1.5
Volume:
Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Graciela Gonzalez-Hernandez, Ari Z. Klein, Ivan Flores, Davy Weissenbacher, Arjun Magge, Karen O'Connor, Abeed Sarker, Anne-Lyse Minard, Elena Tutubalina, Zulfat Miftahutdinov, Ilseyar Alimova
Venue:
SMM4H
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
37–41
Language:
URL:
https://aclanthology.org/2020.smm4h-1.5
DOI:
Bibkey:
Cite (ACL):
Huong Dang, Kahyun Lee, Sam Henry, and Özlem Uzuner. 2020. Ensemble BERT for Classifying Medication-mentioning Tweets. In Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task, pages 37–41, Barcelona, Spain (Online). Association for Computational Linguistics.
Cite (Informal):
Ensemble BERT for Classifying Medication-mentioning Tweets (Dang et al., SMM4H 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp22-frontmatter/2020.smm4h-1.5.pdf
Data
SMM4H