Hasan Murad

2024

pdf abs
Fired_from_NLP at SemEval-2024 Task 1: Towards Developing Semantic Textual Relatedness Predictor - A Transformer-based Approach
Anik Shanto | Md. Sajid Alam Chowdhury | Mostak Chowdhury | Udoy Das | Hasan Murad
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

Predicting semantic textual relatedness (STR) is one of the most challenging tasks in the field of natural language processing. Semantic relatedness prediction has real-life practical applications while developing search engines and modern text generation systems. A shared task on semantic textual relatedness has been organized by SemEval 2024, where the organizer has proposed a dataset on semantic textual relatedness in the English language under Shared Task 1 (Track A3). In this work, we have developed models to predict semantic textual relatedness between pairs of English sentences by training and evaluating various transformer-based model architectures, deep learning, and machine learning methods using the shared dataset. Moreover, we have utilized existing semantic textual relatedness datasets such as the stsb multilingual benchmark dataset, the SemEval 2014 Task 1 dataset, and the SemEval 2015 Task 2 dataset. Our findings show that in the SemEval 2024 Shared Task 1 (Track A3), the fine-tuned-STS-BERT model performed the best, scoring 0.8103 on the test set and placing 25th out of all participants.

2023

The availability of the internet has made it easier for people to share information via social media. People with ill intent can use this widespread availability of the internet to share violent content easily. A significant portion of social media users prefer using their regional language which makes it quite difficult to detect violence-inciting text. The objective of our research work is to detect Bangla violence-inciting text from social media content. A shared task on Bangla violence-inciting text detection has been organized by the First Bangla Language Processing Workshop (BLP) co-located with EMNLP, where the organizer has provided a dataset named VITD with three categories: nonviolence, passive violence, and direct violence text. To accomplish this task, we have implemented three machine learning models (RF, SVM, XGBoost), two deep learning models (LSTM, BiLSTM), and two transformer-based models (BanglaBERT, Hierarchical-BERT). We have conducted a comparative study among different models by training and evaluating each model on the VITD dataset. We have found that Hierarchical-BERT has provided the best result with an F1 score of 0.73797 on the test set and ranked 9th position among all participants in the shared task 1 of the BLP Workshop co-located with EMNLP 2023.

With the popularity of social media platforms, people are sharing their individual thoughts by posting, commenting, and messaging with their friends, which generates a significant amount of digital text data every day. Conducting sentiment analysis of social media content is a vibrant research domain within the realm of Natural Language Processing (NLP), and it has practical, real-world uses. Numerous prior studies have focused on sentiment analysis for languages that have abundant linguistic resources, such as English. However, limited prior research works have been done for automatic sentiment analysis in low-resource languages like Bangla. In this research work, we are going to finetune different transformer-based models for Bangla sentiment analysis. To train and evaluate the model, we have utilized a dataset provided in a shared task organized by the BLP Workshop co-located with EMNLP-2023. Moreover, we have conducted a comparative study among different machine learning models, deep learning models, and transformer-based models for Bangla sentiment analysis. Our findings show that the BanglaBERT (Large) model has achieved the best result with a micro F1-Score of 0.7109 and secured 7th position in the shared task 2 leaderboard of the BLP Workshop in EMNLP 2023.

Co-authors

Md Fayez Ullah 2

Arpita Sarker 2

Anik Shanto 1

Md. Sajid Alam Chowdhury 1

Mostak Chowdhury 1

Venues

banglalp2
semeval1