Deepak Kumar


2022

pdf
Infrrd.ai at SemEval-2022 Task 11: A system for named entity recognition using data augmentation, transformer-based sequence labeling model, and EnsembleCRF
Jianglong He | Akshay Uppal | Mamatha N | Shiv Vignesh | Deepak Kumar | Aditya Kumar Sarda
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

In low-resource languages, the amount of training data is limited. Hence, the model has to perform well in unseen sentences and syntax on which the model has not trained. We propose a method that addresses the problem through an encoder and an ensemble of language models. A language-specific language model performed poorly when compared to a multilingual language model. So, the multilingual language model checkpoint is fine-tuned to a specific language. A novel approach of one hot encoder is introduced between the model outputs and the CRF to combine the results in an ensemble format. Our team, Infrrd.ai, competed in the MultiCoNER competition. The results are encouraging where the team is positioned within the top 10 positions. There is less than a 4% percent difference from the third position in most of the tracks that we participated in. The proposed method shows that the ensemble of models with a multilingual language model as the base with the help of an encoder performs better than a single language-specific model.

2021

pdf
NLP@NISER: Classification of COVID19 tweets containing symptoms
Deepak Kumar | Nalin Kumar | Subhankar Mishra
Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task

In this paper, we describe our approaches for task six of Social Media Mining for Health Applications (SMM4H) shared task in 2021. The task is to classify twitter tweets containing COVID-19 symptoms in three classes (self-reports, non-personal reports & literature/news mentions). We implemented BERT and XLNet for this text classification task. Best result was achieved by XLNet approach, which is F1 score 0.94, precision 0.9448 and recall 0.94448. This is slightly better than the average score, i.e. F1 score 0.93, precision 0.93235 and recall 0.93235.