Varad Pimpalkhute


2021

pdf
IIITN NLP at SMM4H 2021 Tasks: Transformer Models for Classification on Health-Related Imbalanced Twitter Datasets
Varad Pimpalkhute | Prajwal Nakhate | Tausif Diwan
Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task

With increasing users sharing health-related information on social media, there has been a rise in using social media for health monitoring and surveillance. In this paper, we present a system that addresses classic health-related binary classification problems presented in Tasks 1a, 4, and 8 of the 6th edition of Social Media Mining for Health Applications (SMM4H) shared tasks. We developed a system based on RoBERTa (for Task 1a & 4) and BioBERT (for Task 8). Furthermore, we address the challenge of the imbalanced dataset and propose techniques such as undersampling, oversampling, and data augmentation to overcome the imbalanced nature of a given health-related dataset.

2020

pdf
UPennHLP at WNUT-2020 Task 2 : Transformer models for classification of COVID19 posts on Twitter
Arjun Magge | Varad Pimpalkhute | Divya Rallapalli | David Siguenza | Graciela Gonzalez-Hernandez
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)

Increasing usage of social media presents new non-traditional avenues for monitoring disease outbreaks, virus transmissions and disease progressions through user posts describing test results or disease symptoms. However, the discussions on the topic of infectious diseases that are informative in nature also span various topics such as news, politics and humor which makes the data mining challenging. We present a system to identify tweets about the COVID19 disease outbreak that are deemed to be informative on Twitter for use in downstream applications. The system scored a F1-score of 0.8941, Precision of 0.9028, Recall of 0.8856 and Accuracy of 0.9010. In the shared task organized as part of the 6th Workshop of Noisy User-generated Text (WNUT), the system was ranked 18th by F1-score and 13th by Accuracy.