2025
pdf
bib
abs
CUET_INSights@NLU of Devanagari Script Languages 2025: Leveraging Transformer-based Models for Target Identification in Hate Speech
Farjana Alam Tofa
|
Lorin Tasnim Zeba
|
Md Osama
|
Ashim Dey
Proceedings of the First Workshop on Challenges in Processing South Asian Languages (CHiPSAL 2025)
Hate speech detection in multilingual content is a challenging problem especially when it comes to understanding the specific targets of hateful expressions. Identifying the targets of hate speech whether directed at individuals, organizations or communities is crucial for effective content moderation and understanding the context. A shared task on hate speech detection in Devanagari Script Languages organized by CHIPSAL@COLING 2025 allowed us to address the challenge of identifying the target of hate speech in the Devanagari Script Language. For this task, we experimented with various machine learning (ML) and deep learning (DL) models including Logistic Regression, Decision Trees, Random Forest, SVM, CNN, LSTM, BiLSTM, and transformer-based models like MiniLM, m-BERT, and Indic-BERT. Our experiments demonstrated that Indic-BERT achieved the highest F1-score of 0.69, ranked 3rd in the shared task. This research contributes to advancing the field of hate speech detection and natural language processing in low-resource languages.
pdf
bib
abs
CUET_Novice@DravidianLangTech 2025: A Multimodal Transformer-Based Approach for Detecting Misogynistic Memes in Malayalam Language
Khadiza Sultana Sayma
|
Farjana Alam Tofa
|
Md Osama
|
Ashim Dey
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Memes, combining images and text, are a popular social media medium that can spread humor or harmful content, including misogyny—hatred or discrimination against women. Detecting misogynistic memes in Malayalam is challenging due to their multimodal nature, requiring analysis of both visual and textual elements. A Shared Task on Misogyny Meme Detection, organized as part of DravidianLangTech@NAACL 2025, aimed to address this issue by promoting the advancement of multimodal machine learning models for classifying Malayalam memes as misogynistic or non-misogynistic. In this work, we explored visual, textual, and multimodal approaches for meme classification. CNN, ResNet50, Vision Transformer (ViT), and Swin Transformer were used for visual feature extraction, while mBERT, IndicBERT, and MalayalamBERT were employed for textual analysis. Additionally, we experimented with multimodal fusion models, including IndicBERT+ViT, MalayalamBERT+ViT, and MalayalamBERT+Swin. Among these, our MalayalamBERT+Swin Transformer model performed best, achieving the highest weighted F1-score of 0.87631, securing 1st place in the competition. Our results highlight the effectiveness of multimodal learning in detecting misogynistic Malayalam memes and the need for robust AI models in low-resource languages.
pdf
bib
abs
CUET_Novice@DravidianLangTech 2025: Abusive Comment Detection in Malayalam Text Targeting Women on Social Media Using Transformer-Based Models
Farjana Alam Tofa
|
Khadiza Sultana Sayma
|
Md Osama
|
Ashim Dey
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Social media has become a widely used platform for communication and entertainment, but it has also become a space where abuseand harassment can thrive. Women, in particular, face hateful and abusive comments that reflect gender inequality. This paper discussesour participation in the Abusive Text Targeting Women in Dravidian Languages shared task at DravidianLangTech@NAACL 2025, whichfocuses on detecting abusive text targeting women in Malayalam social media comments. The shared task provided a dataset of YouTubecomments in Tamil and Malayalam, focusing on sensitive and controversial topics where abusive behavior is prevalent. Our participationfocused on the Malayalam dataset, where the goal was to classify comments into these categories accurately. Malayalam-BERT achievedthe best performance on the subtask, securing 3rd place with a macro f1-score of 0.7083, highlighting the effectiveness of transformer models for low-resource languages. These results contribute to tackling gender-based abuse and improving online content moderation for underrepresented languages.
pdf
bib
abs
CUET_Novice@DravidianLangTech 2025: A Bi-GRU Approach for Multiclass Political Sentiment Analysis of Tamil Twitter (X) Comments
Arupa Barua
|
Md Osama
|
Ashim Dey
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Political sentiment analysis in multilingual content poses significant challenges in capturing the subtle variations of diverse sentiments expressed in complex and low-resourced languages. Accurately classifying sentiments, whether positive, negative, or neutral, is crucialfor understanding public discourse. A shared task on Political Multiclass Sentiment Analysis of Tamil X (Twitter) Comments, organized by DravidianLangTech@NAACL 2025, provided an opportunity to tackle these challenges. For this task, we implemented two data augmentation techniques, which are synonym replacement and back translation, and then explored various machine learning (ML) algorithms, including Logistic Regression, Decision Tree, Random Forest, SVM, and MultiNomial Naive Bayes. To capture the semantic meanings more efficiently, we experimented with deep learning (DL) models, including GRU, BiLSTM, BiGRU, and a hybrid CNN-BiLSTM.The Bidirectional Gated Recurrent Unit (BiGRU) achieved the best macro-F1 (MF1) score of 0.33, securing the 17th position in the sharedtask. These findings underscore the challenges of political sentiment analysis in low-resource languages and the need for advanced language-specific models for improved classification.