2023
pdf
bib
abs
Hate and Offensive Keyword Extraction from CodeMix Malayalam Social Media Text Using Contextual Embedding
Mariya Raphel
|
Premjith B
|
Sreelakshmi K
|
Bharathi Raja Chakravarthi
Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages
This paper focuses on identifying hate and offensive keywords from codemix Malayalam social media text. As part of this work, a dataset for hate and offensive keyword extraction for codemix Malayalam language was created. Two different methods were experimented to extract Hate and Offensive language (HOL) keywords from social media text. In the first method, intrinsic evaluation was performed on the dataset to identify the hate and offensive keywords. Three different approaches namely – unigram approach, bigram approach and trigram approach were performed to extract the HOL keywords, sequence of HOL words and the sequence that contribute HOL meaning even in the absence of a HOL word. Five different transformer models were used in each of the pproaches for extracting the embeddings for the ngrams. Later, HOL keywords were extracted based on the similarity score obtained using the cosine similarity. Out of the five transformer models, the best results were obtained with multilingual BERT. In the second method, multilingual BERT transformer model was fine tuned with the dataset to develop a HOL keyword tagger model. This work is a new beginning for HOL keyword identification in Dravidian language – Malayalam.
2022
pdf
abs
Findings of the Shared Task on Multimodal Sentiment Analysis and Troll Meme Classification in Dravidian Languages
Premjith B
|
Bharathi Raja Chakravarthi
|
Malliga Subramanian
|
Bharathi B
|
Soman Kp
|
Dhanalakshmi V
|
Sreelakshmi K
|
Arunaggiri Pandian
|
Prasanna Kumaresan
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages
This paper presents the findings of the shared task on Multimodal Sentiment Analysis and Troll meme classification in Dravidian languages held at ACL 2022. Multimodal sentiment analysis deals with the identification of sentiment from video. In addition to video data, the task requires the analysis of corresponding text and audio features for the classification of movie reviews into five classes. We created a dataset for this task in Malayalam and Tamil. The Troll meme classification task aims to classify multimodal Troll memes into two categories. This task assumes the analysis of both text and image features for making better predictions. The performance of the participating teams was analysed using the F1-score. Only one team submitted their results in the Multimodal Sentiment Analysis task, whereas we received six submissions in the Troll meme classification task. The only team that participated in the Multimodal Sentiment Analysis shared task obtained an F1-score of 0.24. In the Troll meme classification task, the winning team achieved an F1-score of 0.596.
2021
pdf
abs
Amrita_CEN_NLP@DravidianLangTech-EACL2021: Deep Learning-based Offensive Language Identification in Malayalam, Tamil and Kannada
Sreelakshmi K
|
Premjith B
|
Soman Kp
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages
This paper describes the submission of the team Amrita_CEN_NLP to the shared task on Offensive Language Identification in Dravidian Languages at EACL 2021. We implemented three deep neural network architectures such as a hybrid network with a Convolutional layer, a Bidirectional Long Short-Term Memory network (Bi-LSTM) layer and a hidden layer, a network containing a Bi-LSTM and another with a Bidirectional Recurrent Neural Network (Bi-RNN). In addition to that, we incorporated a cost-sensitive learning approach to deal with the problem of class imbalance in the training data. Among the three models, the hybrid network exhibited better training performance, and we submitted the predictions based on the same.