Steve Durairaj Swamy


2020

pdf
NIT-Agartala-NLP-Team at SemEval-2020 Task 8: Building Multimodal Classifiers to Tackle Internet Humor
Steve Durairaj Swamy | Shubham Laddha | Basil Abdussalam | Debayan Datta | Anupam Jamatia
Proceedings of the Fourteenth Workshop on Semantic Evaluation

The paper describes the systems submitted to SemEval-2020 Task 8: Memotion by the ‘NIT-Agartala-NLP-Team’. A dataset of 8879 memes was made available by the task organizers to train and test our models. Our systems include a Logistic Regression baseline, a BiLSTM +Attention-based learner and a transfer learning approach with BERT. For the three sub-tasks A, B and C, we attained ranks 24/33, 11/29 and 15/26, respectively. We highlight our difficulties in harnessing image information as well as some techniques and handcrafted features we employ to overcome these issues. We also discuss various modelling issues and theorize possible solutions and reasons as to why these problems persist.

2019

pdf
NIT_Agartala_NLP_Team at SemEval-2019 Task 6: An Ensemble Approach to Identifying and Categorizing Offensive Language in Twitter Social Media Corpora
Steve Durairaj Swamy | Anupam Jamatia | Björn Gambäck | Amitava Das
Proceedings of the 13th International Workshop on Semantic Evaluation

The paper describes the systems submitted to OffensEval (SemEval 2019, Task 6) on ‘Identifying and Categorizing Offensive Language in Social Media’ by the ‘NIT_Agartala_NLP_Team’. A Twitter annotated dataset of 13,240 English tweets was provided by the task organizers to train the individual models, with the best results obtained using an ensemble model composed of six different classifiers. The ensemble model produced macro-averaged F1-scores of 0.7434, 0.7078 and 0.4853 on Subtasks A, B, and C, respectively. The paper highlights the overall low predictive nature of various linguistic features and surface level count features, as well as the limitations of a traditional machine learning approach when compared to a Deep Learning counterpart.

pdf
Studying Generalisability across Abusive Language Detection Datasets
Steve Durairaj Swamy | Anupam Jamatia | Björn Gambäck
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Work on Abusive Language Detection has tackled a wide range of subtasks and domains. As a result of this, there exists a great deal of redundancy and non-generalisability between datasets. Through experiments on cross-dataset training and testing, the paper reveals that the preconceived notion of including more non-abusive samples in a dataset (to emulate reality) may have a detrimental effect on the generalisability of a model trained on that data. Hence a hierarchical annotation model is utilised here to reveal redundancies in existing datasets and to help reduce redundancy in future efforts.