2023
pdf
abs
ML&AI_IIITRanchi@DravidianLangTech: Fine-Tuning IndicBERT for Exploring Language-specific Features for Sentiment Classification in Code-Mixed Dravidian Languages
Kirti Kumari
|
Shirish Shekhar Jha
|
Zarikunte Kunal Dayanand
|
Praneesh Sharma
Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages
Code-mixing presents challenges to sentiment analysis due to limited availability of annotated data found on low-resource languages such as Tulu. To address this issue, comprehensive work was done in creating a gold-standard labeled corpus that incorporates both languages while facilitating accurate analyses of sentiments involved. Encapsulated within this research was the employed use of varied techniques including data collection, cleaning processes as well as preprocessing leading up to effective annotation along with finding results using fine tuning indic bert and performing experiments over tf-idf plus bag of words. The outcome is an invaluable resource for developing custom-tailored models meant solely for analyzing sentiments involved with code mixed texts across Tamil and Tulu domain limits; allowing a focused insight into what makes up such expressions. Remarkably, the adoption of hybrid models yielded promising outcomes, culminating in a 10th rank achievement for Tulu, and a 14thrank achievement for Tamil, supported by an macro F1 score of 0.471 and 0.124 respectively.
pdf
abs
ML&AI_IIITRanchi@DravidianLangTech:Leveraging Transfer Learning for the discernment of Fake News within the Linguistic Domain of Dravidian Language
Kirti Kumari
|
Shirish Shekhar Jha
|
Zarikunte Kunal Dayanand
|
Praneesh Sharma
Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages
The primary focus of this research endeavor lies in detecting and mitigating misinformation within the intricate framework of the Dravidian language. A notable feat was achieved by employing fine-tuning methodologies on the highly acclaimed Indic BERT model, securing a commendable fourth rank in a prestigious competition organized by DravidianLangTech 2023 while attaining a noteworthy macro F1-Score of 0.78. To facilitate this undertaking, a diverse and comprehensive dataset was meticulously gathered from prominent social media platforms, including but not limited to Facebook and Twitter. The overarching objective of this collaborative initiative was to proficiently discern and categorize news articles into either the realm of veracity or deceit through the astute application of advanced machine learning techniques, coupled with the astute exploitation of the distinctive linguistic idiosyncrasies inherent to the Dravidian language.
pdf
abs
ML&AI_IIITRanchi@LT-EDI-2023: Identification of Hope Speech of YouTube comments in Mixed Languages
Kirti Kumari
|
Shirish Shekhar Jha
|
Zarikunte Kunal Dayanand
|
Praneesh Sharma
Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion
Hope speech analysis refers to the examination and evaluation of speeches or messages that aim to instill hope, inspire optimism, and motivate individuals or communities. It involves analyzing the content, language, rhetorical devices, and delivery techniques used in a speech to understand how it conveys hope and its potential impact on the audience. The objective of this study is to classify the given text comments as Hope Speech or Not Hope Speech. The provided dataset consists of YouTube comments in four languages: English, Hindi, Spanish, Bulgarian; with pre-defined classifications. Our approach involved pre-processing the dataset and using the TF-IDF (Term Frequency-Inverse Document Frequency) method.
pdf
abs
ML&AI_IIITRanchi@LT-EDI-2023: Hybrid Model for Text Classification for Identification of Various Types of Depression
Kirti Kumari
|
Shirish Shekhar Jha
|
Zarikunte Kunal Dayanand
|
Praneesh Sharma
Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion
DepSign–LT–EDI@RANLP–2023 is a dedicated task that addresses the crucial issue of identifying indications of depression in individuals through their social media posts, which serve as a platform for expressing their emotions and sentiments. The primary objective revolves around accurately classifying the signs of depression into three distinct categories: “not depressed,” “moderately depressed,” and “severely depressed.” Our study entailed the utilization of machine learning algorithms, coupled with a diverse range of features such as sentence embeddings, TF-IDF, and Bag-of- Words. Remarkably, the adoption of hybrid models yielded promising outcomes, culminating in a 10th rank achievement, supported by macro F1-Score of 0.408. This research underscores the effectiveness and potential of employing advanced text classification methodologies to discern and identify signs of depression within social media data. The findings hold implications for the development of mental health monitoring systems and support mechanisms, contributing to the well-being of individuals in need.
2022
pdf
abs
Bias, Threat and Aggression Identification Using Machine Learning Techniques on Multilingual Comments
Kirti Kumari
|
Shaury Srivastav
|
Rajiv Ranjan Suman
Proceedings of the Third Workshop on Threat, Aggression and Cyberbullying (TRAC 2022)
In this paper, we presented our team "IIITRanchi” for the Trolling, Aggression and Cyberbullying (TRAC-3) 2022 shared tasks. Aggression and its different forms on social media and other platforms had tremendous growth on the Internet. In this work we have tried upon different aspects of aggression, aggression intensity, bias of different forms and their usage online and its identification using different Machine Learning techniques. We have classified each sample at seven different tasks namely aggression level, aggression intensity, discursive role, gender bias, religious bias, caste/class bias and ethnicity/racial bias as specified in the shared tasks. Both of our teams tried machine learning classifiers and achieved the good results. Overall, our team "IIITRanchi” ranked first position in this shared tasks competition.
2020
pdf
abs
AI_ML_NIT_Patna @ TRAC - 2: Deep Learning Approach for Multi-lingual Aggression Identification
Kirti Kumari
|
Jyoti Prakash Singh
Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying
This paper describes the details of developed models and results of team AI_ML_NIT_Patna for the shared task of TRAC - 2. The main objective of the said task is to identify the level of aggression and whether the comment is gendered based or not. The aggression level of each comment can be marked as either Overtly aggressive or Covertly aggressive or Non-aggressive. We have proposed two deep learning systems: Convolutional Neural Network and Long Short Term Memory with two different input text representations, FastText and One-hot embeddings. We have found that the LSTM model with FastText embedding is performing better than other models for Hindi and Bangla datasets but for the English dataset, the CNN model with FastText embedding has performed better. We have also found that the performances of One-hot embedding and pre-trained FastText embedding are comparable. Our system got 11th and 10th positions for English Sub-task A and Sub-task B, respectively, 8th and 7th positions, respectively for Hindi Sub-task A and Sub-task B and 7th and 6th positions for Bangla Sub-task A and Sub-task B, respectively among the total submitted systems.