Udoy Das


2025

pdf bib
CUET-823@DravidianLangTech 2025: Shared Task on Multimodal Misogyny Meme Detection in Tamil Language
Arpita Mallik | Ratnajit Dhar | Udoy Das | Momtazul Arefin Labib | Samia Rahman | Hasan Murad
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

Misogynous content on social media, especially in memes, present challenges due to the complex reciprocation of text and images that carry offensive messages. This difficulty mostly arises from the lack of direct alignment between modalities and biases in large-scale visio-linguistic models. In this paper, we present our system for the Shared Task on Misogyny Meme Detection - DravidianLangTech@NAACL 2025. We have implemented various unimodal models, such as mBERT and IndicBERT for text data, and ViT, ResNet, and EfficientNet for image data. Moreover, we have tried combining these models and finally adopted a multimodal approach that combined mBERT for text and EfficientNet for image features, both fine-tuned to better interpret subtle language and detailed visuals. The fused features are processed through a dense neural network for classification. Our approach achieved an F1 score of 0.78120, securing 4th place and demonstrating the potential of transformer-based architectures and state-of-the-art CNNs for this task.

pdf bib
Team ML_Forge@DravidianLangTech 2025: Multimodal Hate Speech Detection in Dravidian Languages
Adnan Faisal | Shiti Chowdhury | Sajib Bhattacharjee | Udoy Das | Samia Rahman | Momtazul Arefin Labib | Hasan Murad
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

Ensuring a safe and inclusive online environment requires effective hate speech detection on social media. While detection systems have significantly advanced for English, many regional languages, including Malayalam, Tamil and Telugu, remain underrepresented, creating challenges in identifying harmful content accurately. These languages present unique challenges due to their complex grammar, diverse dialects, and frequent code-mixing with English. The rise of multimodal content, including text and audio, adds further complexity to detection tasks. The shared task “Multimodal Hate Speech Detection in Dravidian Languages: DravidianLangTech@NAACL 2025” has aimed to address these challenges. A Youtube-sourced dataset has been provided, labeled into five categories: Gender (G), Political (P), Religious (R), Personal Defamation (C) and Non-Hate (NH). In our approach, we have used mBERT, T5 for text and Wav2Vec2 and Whisper for audio. T5 has performed poorly compared to mBERT, which has achieved the highest F1 scores on the test dataset. For audio, Wav2Vec2 has been chosen over Whisper because it processes raw audio effectively using self-supervised learning. In the hate speech detection task, we have achieved a macro F1 score of 0.2005 for Malayalam, ranking 15th in this task, 0.1356 for Tamil and 0.1465 for Telugu, with both ranking 16th in this task.

pdf bib
CUET_Absolute_Zero@DravidianLangTech 2025: Detecting AI-Generated Product Reviews in Malayalam and Tamil Language Using Transformer Models
Anindo Barua | Sidratul Muntaha | Momtazul Arefin Labib | Samia Rahman | Udoy Das | Hasan Murad
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

Artificial Intelligence (AI) is opening new doors of learning and interaction. However, it has its share of problems. One major issue is the ability of AI to generate text that resembles human-written text. So, how can we tell apart human-written text from AI-generated text?With this in mind, we have worked on detecting AI-generated product reviews in Dravidian languages, mainly in Malayalam and Tamil. The “Shared Task on Detecting AI-Generated Product Reviews in Dravidian Languages,” held as part of the DravidianLangTech Workshop at NAACL 2025 has provided a dataset categorized into two categories, human-written review and AI-generated review. We have implemented four machine learning models (Random Forest, Support Vector Machine, Decision Tree, and XGBoost), four deep learning models (Long Short-Term Memory, Bidirectional Long Short-Term Memory, Gated Recurrent Unit, and Recurrent Neural Network), and three transformer-based models (AI-Human-Detector, Detect-AI-Text, and E5-Small-Lora-AI-Generated-Detector). We have conducted a comparative study among all the models by training and evaluating each model on the dataset. We have discovered that the transformer, E5-Small-Lora-AI-Generated-Detector, has provided the best result with an F1 score of 0.8994 on the test set ranking 7th position in the Malayalam language. Tamil has a higher token overlap and richer morphology than Malayalam. Thus, we obtained a worse F1 score of 0.5877 ranking 28th position in the Tamil language among all participants in the shared task.

2024

pdf bib
Fired_from_NLP at AraFinNLP 2024: Dual-Phase-BERT - A Fine-Tuned Transformer-Based Model for Multi-Dialect Intent Detection in The Financial Domain for The Arabic Language
Md. Sajid Alam Chowdhury | Mostak Mahmud Chowdhury | Anik Mahmud Shanto | Hasan Murad | Udoy Das
Proceedings of the Second Arabic Natural Language Processing Conference

In the financial industry, identifying user intent from text inputs is crucial for various tasks such as automated trading, sentiment analysis, and customer support. One important component of natural language processing (NLP) is intent detection, which is significant to the finance sector. Limited studies have been conducted in the field of finance using languages with limited resources like Arabic, despite notable works being done in high-resource languages like English. To advance Arabic NLP in the financial domain, the organizer of AraFinNLP 2024 has arranged a shared task for detecting banking intents from the queries in various Arabic dialects, introducing a novel dataset named ArBanking77 which includes a collection of banking queries categorized into 77 distinct intents classes. To accomplish this task, we have presented a hierarchical approach called Dual-Phase-BERT in which the detection of dialects is carried out first, followed by the detection of banking intents. Using the provided ArBanking77 dataset, we have trained and evaluated several conventional machine learning, and deep learning models along with some cutting-edge transformer-based models. Among these models, our proposed Dual-Phase-BERT model has ranked 7th out of all competitors, scoring 0.801 on the scale of F1-score on the test set.

pdf bib
CUET_sstm at ArAIEval Shared Task: Unimodal (Text) Propagandistic Technique Detection Using Transformer-Based Model
Momtazul Labib | Samia Rahman | Hasan Murad | Udoy Das
Proceedings of the Second Arabic Natural Language Processing Conference

In recent days, propaganda has started to influence public opinion increasingly as social media usage continues to grow. Our research has been part of the first challenge, Unimodal (Text) Propagandistic Technique Detection of ArAIEval shared task at the ArabicNLP 2024 conference, co-located with ACL 2024, identifying specific Arabic text spans using twenty-three propaganda techniques. We have augmented underrepresented techniques in the provided dataset using synonym replacement and have evaluated various machine learning (RF, SVM, MNB), deep learning (BiLSTM), and transformer-based models (bert-base-arabic, Marefa-NER, AraBERT) with transfer learning. Our comparative study has shown that the transformer model “bert-base-arabic” has outperformed other models. Evaluating the test set, it has achieved the micro-F1 score of 0.2995 which is the highest. This result has secured our team “CUET_sstm” first place among all participants in task 1 of the ArAIEval.

pdf bib
CUET_SSTM at the GEM’24 Summarization Task: Integration of extractive and abstractive method for long text summarization in Swahili language
Samia Rahman | Momtazul Arefin Labib | Hasan Murad | Udoy Das
Proceedings of the 17th International Natural Language Generation Conference: Generation Challenges

Swahili, spoken by around 200 million people primarily in Tanzania and Kenya, has been the focus of our research for the GEM Shared Task at INLG’24 on Underrepresented Language Summarization. We have utilized the XLSUM dataset and have manually summarized 1000 texts from a Swahili news classification dataset. To achieve the desired results, we have tested abstractive summarizers (mT5_multilingual_XLSum, t5-small, mBART-50), and an extractive summarizer (based on PageRank algorithm). But our adopted model consists of an integrated extractive-abstractive model combining the Bert Extractive Summarizer with some abstractive summarizers (t5-small, mBART-50). The integrated model overcome the drawbacks of both the extractive and abstractive summarization system and utilizes the benefit from both of it. Extractive summarizer shorten the paragraphs exceeding 512 tokens, ensuring no important information has been lost before applying the abstractive models. The abstractive summarizer use its pretrained knowledge and ensure to generate context based summary.

pdf bib
Fired_from_NLP at SemEval-2024 Task 1: Towards Developing Semantic Textual Relatedness Predictor - A Transformer-based Approach
Anik Mahmud Shanto | Md. Sajid Alam Chowdhury | Mostak Mahmud Chowdhury | Udoy Das | Hasan Murad
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

Predicting semantic textual relatedness (STR) is one of the most challenging tasks in the field of natural language processing. Semantic relatedness prediction has real-life practical applications while developing search engines and modern text generation systems. A shared task on semantic textual relatedness has been organized by SemEval 2024, where the organizer has proposed a dataset on semantic textual relatedness in the English language under Shared Task 1 (Track A3). In this work, we have developed models to predict semantic textual relatedness between pairs of English sentences by training and evaluating various transformer-based model architectures, deep learning, and machine learning methods using the shared dataset. Moreover, we have utilized existing semantic textual relatedness datasets such as the stsb multilingual benchmark dataset, the SemEval 2014 Task 1 dataset, and the SemEval 2015 Task 2 dataset. Our findings show that in the SemEval 2024 Shared Task 1 (Track A3), the fine-tuned-STS-BERT model performed the best, scoring 0.8103 on the test set and placing 25th out of all participants.

2023

pdf bib
EmptyMind at BLP-2023 Task 1: A Transformer-based Hierarchical-BERT Model for Bangla Violence-Inciting Text Detection
Udoy Das | Karnis Fatema | Md Ayon Mia | Mahshar Yahan | Md Sajidul Mowla | Md Fayez Ullah | Arpita Sarker | Hasan Murad
Proceedings of the First Workshop on Bangla Language Processing (BLP-2023)

The availability of the internet has made it easier for people to share information via social media. People with ill intent can use this widespread availability of the internet to share violent content easily. A significant portion of social media users prefer using their regional language which makes it quite difficult to detect violence-inciting text. The objective of our research work is to detect Bangla violence-inciting text from social media content. A shared task on Bangla violence-inciting text detection has been organized by the First Bangla Language Processing Workshop (BLP) co-located with EMNLP, where the organizer has provided a dataset named VITD with three categories: nonviolence, passive violence, and direct violence text. To accomplish this task, we have implemented three machine learning models (RF, SVM, XGBoost), two deep learning models (LSTM, BiLSTM), and two transformer-based models (BanglaBERT, Hierarchical-BERT). We have conducted a comparative study among different models by training and evaluating each model on the VITD dataset. We have found that Hierarchical-BERT has provided the best result with an F1 score of 0.73797 on the test set and ranked 9th position among all participants in the shared task 1 of the BLP Workshop co-located with EMNLP 2023.

pdf bib
EmptyMind at BLP-2023 Task 2: Sentiment Analysis of Bangla Social Media Posts using Transformer-Based Models
Karnis Fatema | Udoy Das | Md Ayon Mia | Md Sajidul Mowla | Mahshar Yahan | Md Fayez Ullah | Arpita Sarker | Hasan Murad
Proceedings of the First Workshop on Bangla Language Processing (BLP-2023)

With the popularity of social media platforms, people are sharing their individual thoughts by posting, commenting, and messaging with their friends, which generates a significant amount of digital text data every day. Conducting sentiment analysis of social media content is a vibrant research domain within the realm of Natural Language Processing (NLP), and it has practical, real-world uses. Numerous prior studies have focused on sentiment analysis for languages that have abundant linguistic resources, such as English. However, limited prior research works have been done for automatic sentiment analysis in low-resource languages like Bangla. In this research work, we are going to finetune different transformer-based models for Bangla sentiment analysis. To train and evaluate the model, we have utilized a dataset provided in a shared task organized by the BLP Workshop co-located with EMNLP-2023. Moreover, we have conducted a comparative study among different machine learning models, deep learning models, and transformer-based models for Bangla sentiment analysis. Our findings show that the BanglaBERT (Large) model has achieved the best result with a micro F1-Score of 0.7109 and secured 7th position in the shared task 2 leaderboard of the BLP Workshop in EMNLP 2023.