Kavya G


2023

pdf
MUCS@DravidianLangTech2023: Leveraging Learning Models to Identify Abusive Comments in Code-mixed Dravidian Languages
Asha Hegde | Kavya G | Sharal Coelho | Hosahalli Lakshmaiah Shashirekha
Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages

Abusive language detection in user-generated online content has become a pressing concern due to its negative impact on users and challenges for policy makers. Online platforms are faced with the task of moderating abusive content to mitigate societal harm, adhere to legal requirements, and foster inclusivity. Despite numerous methods developed for automated detection of abusive language, the problem continues to persist. This ongoing challenge necessitates further research and development to enhance the effectiveness of abusive content detection systems and implement proactive measures to create safer and more respectful online spaces. To address the automatic detection of abusive languages in social media platforms, this paper describes the models submitted by our team - MUCS to the shared task “Abusive Comment Detection in Tamil and Telugu” at DravidianLangTech - in Recent Advances in Natural Language Processing (RANLP) 2023. This shared task addresses the abusive comment detection in code-mixed Tamil, Telugu, and romanized Tamil (Tamil-English) texts. Two distinct models: i) AbusiveML - a model implemented utilizing Linear Support Vector Classifier (LinearSVC) algorithm fed with n-grams of words and character sequences within word boundary (char_wb) features and ii) AbusiveTL - a Transfer Learning (TL ) model with three different Bidirectional Encoder Representations from Transformers (BERT) models along with random oversampling to deal with data imbalance, are submitted to the shared task for detecting abusive language in the given code-mixed texts. The AbusiveTL model fared well among these two models, with macro F1 scores of 0.46, 0.74, and 0.49 for code-mixed Tamil, Telugu, and Tamil-English texts respectively.

pdf
MUNLP@DravidianLangTech2023: Learning Approaches for Sentiment Analysis in Code-mixed Tamil and Tulu Text
Asha Hegde | Kavya G | Sharal Coelho | Pooja Lamani | Hosahalli Lakshmaiah Shashirekha
Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages

Sentiment Analysis (SA) examines the subjective content of a statement, such as opinions, assessments, feelings, or attitudes towards a subject, person, or a thing. Though several models are developed for SA in high-resource languages like English, Spanish, German, etc., uder-resourced languages like Dravidian languages are less explored. To address the challenges of SA in low resource Dravidian languages, in this paper, we team MUNLP describe the models submitted to “Sentiment Analysis in Tamil and Tulu- DravidianLangTech” shared task at Recent Advances in Natural Language Processing (RANLP)-2023. n-gramsSA, EmbeddingsSA and BERTSA are the models proposed for SA shared task. Among all the models, BERTSA exhibited a maximum macro F1 score of 0.26 for code-mixed Tamil texts securing 2nd place in the shared task. EmbeddingsSA exhibited maximum macro F1 score of 0.53 securing 2nd place for Tulu code-mixed texts.

pdf
MUCSD@DravidianLangTech2023: Predicting Sentiment in Social Media Text using Machine Learning Techniques
Sharal Coelho | Asha Hegde | Pooja Lamani | Kavya G | Hosahalli Lakshmaiah Shashirekha
Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages

User-generated social media texts are a blend of resource-rich languages like English and low-resource Dravidian languages like Tamil, Kannada, Tulu, etc. These texts referred to as code-mixing texts are enriching social media since they are written in two or more languages using either a common language script or various language scripts. However, due to the complex nature of the code-mixed text, in this paper, we - team MUCSD, describe a Machine learning (ML) models submitted to “Sentiment Analysis in Tamil and Tulu” shared task at DravidianLangTech@RANLP 2023. The proposed methodology makes use of ML models such as Linear Support Vector Classifier (LinearSVC), LR, and ensemble model (LR, DT, and SVM) to perform SA in Tamil and Tulu languages. The proposed LinearSVC model’s predictions submitted to the shared tasks, obtained 8th and 9th rank for Tamil-English and Tulu-English respectively.

pdf
MUCS@DravidianLangTech2023: Malayalam Fake News Detection Using Machine Learning Approach
Sharal Coelho | Asha Hegde | Kavya G | Hosahalli Lakshmaiah Shashirekha
Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages

Social media is widely used to spread fake news, which affects a larger population. So it is considered as a very important task to detect fake news spread on social media platforms. To address the challenges in the identification of fake news in the Malayalam language, in this paper, we - team MUCS, describe the Machine Learning (ML) models submitted to “Fake News Detection in Dravidian Languages” at DravidianLangTech@RANLP 2023 shared task. Three different models, namely, Multinomial Naive Bayes (MNB), Logistic Regression (LR), and Ensemble model (MNB, LR, and SVM) are trained using Term Frequency - Inverse Document Frequency (TF-IDF) of word unigrams. Among the three models ensemble model performed better with a macro F1-score of 0.83 and placed 3rd rank in the shared task.

pdf
MUCS@LT-EDI2023: Learning Approaches for Hope Speech Detection in Social Media Text
Asha Hegde | Kavya G | Sharal Coelho | Hosahalli Lakshmaiah Shashirekha
Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion

Hope plays a significant role in shaping human thoughts and actions and hope content has received limited attention in the realm of social media data analysis. The exploration of hope content helps to uncover the valuable insights into users’ aspirations, expectations, and emotional states. By delving into the analysis of hope content on social media platforms, researchers and analysts can gain a deeper understanding of how hope influences individuals’ behaviors, decisions, and overall well-being in the digital age. However, this area is rarely explored even for resource-high languages. To address the identification of hope text in social media platforms, this paper describes the models submitted by the team MUCS to “Hope Speech Detection for Equality, Diversity, and Inclusion (LT-EDI)” shared task organized at Recent Advances in Natural Language Processing (RANLP) - 2023. This shared task aims to classify a comment/post in English and code-mixed texts in three languages, namely, Bulgarian, Spanish, and Hindi into one of the two predefined categories, namely, “Hope speech” and “Non Hope speech”. Two models, namely: i) Hope_BERT - Linear Support Vector Classifier (LinearSVC) model trained by combining Bidirectional Encoder Representations from Transformers (BERT) embeddings and Term Frequency-Inverse Document Frequency (TF-IDF) of character n-grams with word boundary (char_wb) for English and ii) Hope_mBERT - LinearSVC model trained by combining Multilingual BERT (mBERT) embeddings and TF-IDF of char_wb for Bulgarian, Spanish, and Hindi code-mixed texts are proposed for the shared task to classify the given text into Hope or Non-Hope categories. The proposed models obtained 1st, 1st, 2nd, and 5th ranks for Spanish, Bulgarian, Hindi, and English texts respectively.

pdf
MUCS@LT-EDI2023: Homophobic/Transphobic Content Detection in Social Media Text using mBERT
Asha Hegde | Kavya G | Sharal Coelho | Hosahalli Lakshmaiah Shashirekha
Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion

Homophobic/Transphobic (H/T) content includes hate speech, discrimination text, and abusive comments against Gay, Lesbian, Bisexual, Transgender, Queer, and Intersex (LGBTQ) individuals. With the increase in user generated text in social media, there has been an increase in code-mixed H/T content, which poses challenges for efficient analysis and detection of H/T content on social media. The complex nature of code-mixed text necessitates the development of advanced tools and techniques to effectively tackle this issue in social media platforms. To tackle this issue, in this paper, we - team MUCS, describe the transformer based models submitted to “Homophobia/Transphobia Detection in social media comments” shared task in Language Technology for Equality, Diversity and Inclusion (LT-EDI) at Recent Advances in Natural Language Processing (RANLP)-2023. The proposed methodology makes use of resampling the training data to handle the data imbalance and this resampled data is used to fine-tune the Multilingual Bidirectional Encoder Representations from Transformers (mBERT) models. These models obtained 11th, 5th, 3rd, 3rd, and 7th ranks for English, Tamil, Malayalam, Spanish, and Hindi respectively in Task A and 8th, 2nd, and 2nd ranks for English, Tamil, and Malayalam respectively in Task B.

pdf
MUCS@LT-EDI2023: Detecting Signs of Depression in Social Media Text
Sharal Coelho | Asha Hegde | Kavya G | Hosahalli Lakshmaiah Shashirekha
Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion

Depression can lead to significant changes in individuals’ posts on social media which is a important task to identify. Automated techniques must be created for the identification task as manually analyzing the growing volume of social media data is time-consuming. To address the signs of depression posts on social media, in this paper, we - team MUCS, describe a Transfer Learning (TL) model and Machine Learning (ML) models submitted to “Detecting Signs of Depression from Social Media Text” shared task organised by DepSign-LT-EDI@RANLP-2023. The TL model is trained using raw text Bidirectional Encoder Representations from Transformers (BERT) and the ML model is trained using Term Frequency-Inverse Document Frequency (TF-IDF) features separately. Among these three models, the TL model performed better with a macro averaged F1-score of 0.361 and placed 20th rank in the shared task.