Millon Das


2022

pdf bib
Which One Is More Toxic? Findings from Jigsaw Rate Severity of Toxic Comments
Millon Das | Punyajoy Saha | Mithun Das
Proceedings of the Third Workshop on Threat, Aggression and Cyberbullying (TRAC 2022)

The proliferation of online hate speech has necessitated the creation of algorithms which can detect toxicity. Most of the past research focuses on this detection as a classification task, but assigning an absolute toxicity label is often tricky. Hence, few of the past works transform the same task into a regression. This paper shows the comparative evaluation of different transformers and traditional machine learning models on a recently released toxicity severity measurement dataset by Jigsaw. We further demonstrate the issues with the model predictions using explainability analysis.

pdf
Enolp musk@SMM4H’22 : Leveraging Pre-trained Language Models for Stance And Premise Classification
Millon Das | Archit Mangrulkar | Ishan Manchanda | Manav Kapadnis | Sohan Patnaik
Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task

This paper covers our approaches for the Social Media Mining for Health (SMM4H) Shared Tasks 2a and 2b. Apart from the baseline architectures, we experiment with Parts of Speech (PoS), dependency parsing, and Tf-Idf features. Additionally, we perform contrastive pretraining on our best models using a supervised contrastive loss function. In both the tasks, we outperformed the mean and median scores and ranked first on the validation set. For stance classification, we achieved an F1-score of 0.636 using the CovidTwitterBERT model, while for premise classification, we achieved an F1-score of 0.664 using BART-base model on test dataset.