2022
pdf
abs
CMNEROne at SemEval-2022 Task 11: Code-Mixed Named Entity Recognition by leveraging multilingual data
Suman Dowlagar
|
Radhika Mamidi
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
Identifying named entities is, in general, a practical and challenging task in the field of Natural Language Processing. Named Entity Recognition on the code-mixed text is further challenging due to the linguistic complexity resulting from the nature of the mixing. This paper addresses the submission of team CMNEROne to the SEMEVAL 2022 shared task 11 MultiCoNER. The Code-mixed NER task aimed to identify named entities on the code-mixed dataset. Our work consists of Named Entity Recognition (NER) on the code-mixed dataset by leveraging the multilingual data. We achieved a weighted average F1 score of 0.7044, i.e., 6% greater than the NER baseline.
pdf
abs
DepressionOne@LT-EDI-ACL2022: Using Machine Learning with SMOTE and Random UnderSampling to Detect Signs of Depression on Social Media Text.
Suman Dowlagar
|
Radhika Mamidi
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion
Depression is a common and serious medical illness that negatively affects how you feel, the way you think, and how you act. Detecting depression is essential as it must be treated early to avoid painful consequences. Nowadays, people are broadcasting how they feel via posts and comments. Using social media, we can extract many comments related to depression and use NLP techniques to train and detect depression. This work presents the submission of the DepressionOne team at LT-EDI-2022 for the shared task, detecting signs of depression from social media text. The depression data is small and unbalanced. Thus, we have used oversampling and undersampling methods such as SMOTE and RandomUnderSampler to represent the data. Later, we used machine learning methods to train and detect the signs of depression.
2021
pdf
abs
Graph Convolutional Networks with Multi-headed Attention for Code-Mixed Sentiment Analysis
Suman Dowlagar
|
Radhika Mamidi
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages
Code-mixing is a frequently observed phenomenon in multilingual communities where a speaker uses multiple languages in an utterance or sentence. Code-mixed texts are abundant, especially in social media, and pose a problem for NLP tools as they are typically trained on monolingual corpora. Recently, finding the sentiment from code-mixed text has been attempted by some researchers in SentiMix SemEval 2020 and Dravidian-CodeMix FIRE 2020 shared tasks. Mostly, the attempts include traditional methods, long short term memory, convolutional neural networks, and transformer models for code-mixed sentiment analysis (CMSA). However, no study has explored graph convolutional neural networks on CMSA. In this paper, we propose the graph convolutional networks (GCN) for sentiment analysis on code-mixed text. We have used the datasets from the Dravidian-CodeMix FIRE 2020. Our experimental results on multiple CMSA datasets demonstrate that the GCN with multi-headed attention model has shown an improvement in classification metrics.
pdf
abs
OFFLangOne@DravidianLangTech-EACL2021: Transformers with the Class Balanced Loss for Offensive Language Identification in Dravidian Code-Mixed text.
Suman Dowlagar
|
Radhika Mamidi
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages
The intensity of online abuse has increased in recent years. Automated tools are being developed to prevent the use of hate speech and offensive content. Most of the technologies use natural language and machine learning tools to identify offensive text. In a multilingual society, where code-mixing is a norm, the hate content would be delivered in a code-mixed form in social media, which makes the offensive content identification, further challenging. In this work, we participated in the EACL task to detect offensive content in the code-mixed social media scenario. The methodology uses a transformer model with transliteration and class balancing loss for offensive content identification. In this task, our model has been ranked 2nd in Malayalam-English and 4th in Tamil-English code-mixed languages.
pdf
abs
EDIOne@LT-EDI-EACL2021: Pre-trained Transformers with Convolutional Neural Networks for Hope Speech Detection.
Suman Dowlagar
|
Radhika Mamidi
Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion
Hope is an essential aspect of mental health stability and recovery in every individual in this fast-changing world. Any tools and methods developed for detection, analysis, and generation of hope speech will be beneficial. In this paper, we propose a model on hope-speech detection to automatically detect web content that may play a positive role in diffusing hostility on social media. We perform the experiments by taking advantage of pre-processing and transfer-learning models. We observed that the pre-trained multilingual-BERT model with convolution neural networks gave the best results. Our model ranked first, third, and fourth ranks on English, Malayalam-English, and Tamil-English code-mixed datasets.
pdf
abs
Gated Convolutional Sequence to Sequence Based Learning for English-Hingilsh Code-Switched Machine Translation.
Suman Dowlagar
|
Radhika Mamidi
Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching
Code-Switching is the embedding of linguistic units or phrases from two or more languages in a single sentence. This phenomenon is practiced in all multilingual communities and is prominent in social media. Consequently, there is a growing need to understand code-switched translations by translating the code-switched text into one of the standard languages or vice versa. Neural Machine translation is a well-studied research problem in the monolingual text. In this paper, we have used the gated convolutional sequences to sequence networks for English-Hinglish translation. The convolutions in the model help to identify the compositional structure in the sequences more easily. The model relies on gating and performs multiple attention steps at encoder and decoder layers.
pdf
abs
A Pre-trained Transformer and CNN Model with Joint Language ID and Part-of-Speech Tagging for Code-Mixed Social-Media Text
Suman Dowlagar
|
Radhika Mamidi
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
Code-mixing (CM) is a frequently observed phenomenon that uses multiple languages in an utterance or sentence. There are no strict grammatical constraints observed in code-mixing, and it consists of non-standard variations of spelling. The linguistic complexity resulting from the above factors made the computational analysis of the code-mixed language a challenging task. Language identification (LI) and part of speech (POS) tagging are the fundamental steps that help analyze the structure of the code-mixed text. Often, the LI and POS tagging tasks are interdependent in the code-mixing scenario. We project the problem of dealing with multilingualism and grammatical structure while analyzing the code-mixed sentence as a joint learning task. In this paper, we jointly train and optimize language detection and part of speech tagging models in the code-mixed scenario. We used a Transformer with convolutional neural network architecture. We train a joint learning method by combining POS tagging and LI models on code-mixed social media text obtained from the ICON shared task.
2020
pdf
abs
Does a Hybrid Neural Network based Feature Selection Model Improve Text Classification?
Suman Dowlagar
|
Radhika Mamidi
Proceedings of the 17th International Conference on Natural Language Processing (ICON)
Text classification is a fundamental problem in the field of natural language processing. Text classification mainly focuses on giving more importance to all the relevant features that help classify the textual data. Apart from these, the text can have redundant or highly correlated features. These features increase the complexity of the classification algorithm. Thus, many dimensionality reduction methods were proposed with the traditional machine learning classifiers. The use of dimensionality reduction methods with machine learning classifiers has achieved good results. In this paper, we propose a hybrid feature selection method for obtaining relevant features by combining various filter-based feature selection methods and fastText classifier. We then present three ways of implementing a feature selection and neural network pipeline. We observed a reduction in training time when feature selection methods are used along with neural networks. We also observed a slight increase in accuracy on some datasets.
pdf
abs
Multilingual Pre-Trained Transformers and Convolutional NN Classification Models for Technical Domain Identification
Suman Dowlagar
|
Radhika Mamidi
Proceedings of the 17th International Conference on Natural Language Processing (ICON): TechDOfication 2020 Shared Task
In this paper, we present a transfer learning system to perform technical domain identification on multilingual text data. We have submitted two runs, one uses the transformer model BERT, and the other uses XLM-ROBERTa with the CNN model for text classification. These models allowed us to identify the domain of the given sentences for the ICON 2020 shared Task, TechDOfication: Technical Domain Identification. Our system ranked the best for the subtasks 1d, 1g for the given TechDOfication dataset.
pdf
bib
abs
Unsupervised Technical Domain Terms Extraction using Term Extractor
Suman Dowlagar
|
Radhika Mamidi
Proceedings of the 17th International Conference on Natural Language Processing (ICON): TermTraction 2020 Shared Task
Terminology extraction, also known as term extraction, is a subtask of information extraction. The goal of terminology extraction is to extract relevant words or phrases from a given corpus automatically. This paper focuses on the unsupervised automated domain term extraction method that considers chunking, preprocessing, and ranking domain-specific terms using relevance and cohesion functions for ICON 2020 shared task 2: TermTraction.
2015
pdf
A Semi Supervised Dialog Act Tagging for Telugu
Suman Dowlagar
|
Radhika Mamidi
Proceedings of the 12th International Conference on Natural Language Processing