IIITN NLP at SMM4H 2021 Tasks: Transformer Models for Classification on Health-Related Imbalanced Twitter Datasets

Varad Pimpalkhute; Prajwal Nakhate; Tausif Diwan

doi:10.18653/v1/2021.smm4h-1.24

IIITN NLP at SMM4H 2021 Tasks: Transformer Models for Classification on Health-Related Imbalanced Twitter Datasets

Varad Pimpalkhute, Prajwal Nakhate, Tausif Diwan

Abstract

With increasing users sharing health-related information on social media, there has been a rise in using social media for health monitoring and surveillance. In this paper, we present a system that addresses classic health-related binary classification problems presented in Tasks 1a, 4, and 8 of the 6th edition of Social Media Mining for Health Applications (SMM4H) shared tasks. We developed a system based on RoBERTa (for Task 1a & 4) and BioBERT (for Task 8). Furthermore, we address the challenge of the imbalanced dataset and propose techniques such as undersampling, oversampling, and data augmentation to overcome the imbalanced nature of a given health-related dataset.

Anthology ID:: 2021.smm4h-1.24
Volume:: Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task
Month:: June
Year:: 2021
Address:: Mexico City, Mexico
Editors:: Arjun Magge, Ari Klein, Antonio Miranda-Escalada, Mohammed Ali Al-garadi, Ilseyar Alimova, Zulfat Miftahutdinov, Eulalia Farre-Maduell, Salvador Lima Lopez, Ivan Flores, Karen O'Connor, Davy Weissenbacher, Elena Tutubalina, Abeed Sarker, Juan M Banda, Martin Krallinger, Graciela Gonzalez-Hernandez
Venue:: SMM4H
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 118–122
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2021.smm4h-1.24/
DOI:: 10.18653/v1/2021.smm4h-1.24
Bibkey:
Cite (ACL):: Varad Pimpalkhute, Prajwal Nakhate, and Tausif Diwan. 2021. IIITN NLP at SMM4H 2021 Tasks: Transformer Models for Classification on Health-Related Imbalanced Twitter Datasets. In Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task, pages 118–122, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):: IIITN NLP at SMM4H 2021 Tasks: Transformer Models for Classification on Health-Related Imbalanced Twitter Datasets (Pimpalkhute et al., SMM4H 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2021.smm4h-1.24.pdf

PDF Cite Search Fix data