Shirish Shekhar Jha

2023

pdf abs
ML&AI_IIITRanchi@LT-EDI-2023: Identification of Hope Speech of YouTube comments in Mixed Languages
Kirti Kumari | Shirish Shekhar Jha | Zarikunte Kunal Dayanand | Praneesh Sharma
Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion

Hope speech analysis refers to the examination and evaluation of speeches or messages that aim to instill hope, inspire optimism, and motivate individuals or communities. It involves analyzing the content, language, rhetorical devices, and delivery techniques used in a speech to understand how it conveys hope and its potential impact on the audience. The objective of this study is to classify the given text comments as Hope Speech or Not Hope Speech. The provided dataset consists of YouTube comments in four languages: English, Hindi, Spanish, Bulgarian; with pre-defined classifications. Our approach involved pre-processing the dataset and using the TF-IDF (Term Frequency-Inverse Document Frequency) method.

pdf abs
ML&AI_IIITRanchi@LT-EDI-2023: Hybrid Model for Text Classification for Identification of Various Types of Depression
Kirti Kumari | Shirish Shekhar Jha | Zarikunte Kunal Dayanand | Praneesh Sharma
Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion

DepSign–LT–EDI@RANLP–2023 is a dedicated task that addresses the crucial issue of identifying indications of depression in individuals through their social media posts, which serve as a platform for expressing their emotions and sentiments. The primary objective revolves around accurately classifying the signs of depression into three distinct categories: “not depressed,” “moderately depressed,” and “severely depressed.” Our study entailed the utilization of machine learning algorithms, coupled with a diverse range of features such as sentence embeddings, TF-IDF, and Bag-of- Words. Remarkably, the adoption of hybrid models yielded promising outcomes, culminating in a 10^th rank achievement, supported by macro F1-Score of 0.408. This research underscores the effectiveness and potential of employing advanced text classification methodologies to discern and identify signs of depression within social media data. The findings hold implications for the development of mental health monitoring systems and support mechanisms, contributing to the well-being of individuals in need.

pdf abs
ML&AI_IIITRanchi@DravidianLangTech: Fine-Tuning IndicBERT for Exploring Language-specific Features for Sentiment Classification in Code-Mixed Dravidian Languages
Kirti Kumari | Shirish Shekhar Jha | Zarikunte Kunal Dayanand | Praneesh Sharma
Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages

Code-mixing presents challenges to sentiment analysis due to limited availability of annotated data found on low-resource languages such as Tulu. To address this issue, comprehensive work was done in creating a gold-standard labeled corpus that incorporates both languages while facilitating accurate analyses of sentiments involved. Encapsulated within this research was the employed use of varied techniques including data collection, cleaning processes as well as preprocessing leading up to effective annotation along with finding results using fine tuning indic bert and performing experiments over tf-idf plus bag of words. The outcome is an invaluable resource for developing custom-tailored models meant solely for analyzing sentiments involved with code mixed texts across Tamil and Tulu domain limits; allowing a focused insight into what makes up such expressions. Remarkably, the adoption of hybrid models yielded promising outcomes, culminating in a 10th rank achievement for Tulu, and a 14thrank achievement for Tamil, supported by an macro F1 score of 0.471 and 0.124 respectively.

pdf abs
ML&AI_IIITRanchi@DravidianLangTech:Leveraging Transfer Learning for the discernment of Fake News within the Linguistic Domain of Dravidian Language
Kirti Kumari | Shirish Shekhar Jha | Zarikunte Kunal Dayanand | Praneesh Sharma
Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages

The primary focus of this research endeavor lies in detecting and mitigating misinformation within the intricate framework of the Dravidian language. A notable feat was achieved by employing fine-tuning methodologies on the highly acclaimed Indic BERT model, securing a commendable fourth rank in a prestigious competition organized by DravidianLangTech 2023 while attaining a noteworthy macro F1-Score of 0.78. To facilitate this undertaking, a diverse and comprehensive dataset was meticulously gathered from prominent social media platforms, including but not limited to Facebook and Twitter. The overarching objective of this collaborative initiative was to proficiently discern and categorize news articles into either the realm of veracity or deceit through the astute application of advanced machine learning techniques, coupled with the astute exploitation of the distinctive linguistic idiosyncrasies inherent to the Dravidian language.

Shirish Shekhar Jha

2023

Co-authors

Venues