Abstract
With increasing users sharing health-related information on social media, there has been a rise in using social media for health monitoring and surveillance. In this paper, we present a system that addresses classic health-related binary classification problems presented in Tasks 1a, 4, and 8 of the 6th edition of Social Media Mining for Health Applications (SMM4H) shared tasks. We developed a system based on RoBERTa (for Task 1a & 4) and BioBERT (for Task 8). Furthermore, we address the challenge of the imbalanced dataset and propose techniques such as undersampling, oversampling, and data augmentation to overcome the imbalanced nature of a given health-related dataset.- Anthology ID:
- 2021.smm4h-1.24
- Volume:
- Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task
- Month:
- June
- Year:
- 2021
- Address:
- Mexico City, Mexico
- Venue:
- SMM4H
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 118–122
- Language:
- URL:
- https://aclanthology.org/2021.smm4h-1.24
- DOI:
- 10.18653/v1/2021.smm4h-1.24
- Cite (ACL):
- Varad Pimpalkhute, Prajwal Nakhate, and Tausif Diwan. 2021. IIITN NLP at SMM4H 2021 Tasks: Transformer Models for Classification on Health-Related Imbalanced Twitter Datasets. In Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task, pages 118–122, Mexico City, Mexico. Association for Computational Linguistics.
- Cite (Informal):
- IIITN NLP at SMM4H 2021 Tasks: Transformer Models for Classification on Health-Related Imbalanced Twitter Datasets (Pimpalkhute et al., SMM4H 2021)
- PDF:
- https://preview.aclanthology.org/auto-file-uploads/2021.smm4h-1.24.pdf