IIITN NLP at SMM4H 2021 Tasks: Transformer Models for Classification on Health-Related Imbalanced Twitter Datasets

Varad Pimpalkhute, Prajwal Nakhate, Tausif Diwan


Abstract
With increasing users sharing health-related information on social media, there has been a rise in using social media for health monitoring and surveillance. In this paper, we present a system that addresses classic health-related binary classification problems presented in Tasks 1a, 4, and 8 of the 6th edition of Social Media Mining for Health Applications (SMM4H) shared tasks. We developed a system based on RoBERTa (for Task 1a & 4) and BioBERT (for Task 8). Furthermore, we address the challenge of the imbalanced dataset and propose techniques such as undersampling, oversampling, and data augmentation to overcome the imbalanced nature of a given health-related dataset.
Anthology ID:
2021.smm4h-1.24
Volume:
Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task
Month:
June
Year:
2021
Address:
Mexico City, Mexico
Editors:
Arjun Magge, Ari Klein, Antonio Miranda-Escalada, Mohammed Ali Al-garadi, Ilseyar Alimova, Zulfat Miftahutdinov, Eulalia Farre-Maduell, Salvador Lima Lopez, Ivan Flores, Karen O'Connor, Davy Weissenbacher, Elena Tutubalina, Abeed Sarker, Juan M Banda, Martin Krallinger, Graciela Gonzalez-Hernandez
Venue:
SMM4H
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
118–122
Language:
URL:
https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2021.smm4h-1.24/
DOI:
10.18653/v1/2021.smm4h-1.24
Bibkey:
Cite (ACL):
Varad Pimpalkhute, Prajwal Nakhate, and Tausif Diwan. 2021. IIITN NLP at SMM4H 2021 Tasks: Transformer Models for Classification on Health-Related Imbalanced Twitter Datasets. In Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task, pages 118–122, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
IIITN NLP at SMM4H 2021 Tasks: Transformer Models for Classification on Health-Related Imbalanced Twitter Datasets (Pimpalkhute et al., SMM4H 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2021.smm4h-1.24.pdf