Gautham Senthilnathan
2020
Sentence Contextual Encoder with BERT and BiLSTM for Automatic Classification with Imbalanced Medication Tweets
Olanrewaju Tahir Aduragba
|
Jialin Yu
|
Gautham Senthilnathan
|
Alexandra Crsitea
Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task
This paper details the system description and approach used by our team for the SMM4H 2020 competition, Task 1. Task 1 targets the automatic classification of tweets that mention medication. We adapted the standard BERT pretrain-then-fine-tune approach to include an intermediate training stage with a biLSTM architecture neural network acting as a further fine-tuning stage. We were inspired by the effectiveness of within-task further pre-training and sentence encoders. We show that this approach works well for a highly imbalanced dataset. In this case, the positive class is only 0.2% of the entire dataset. Our model performed better in both F1 and precision scores compared to the mean score for all participants in the competition and had a competitive recall score.
Search