Eftekhar Hossain


2021

pdf bib
NLP-CUET@DravidianLangTech-EACL2021: Offensive Language Detection from Multilingual Code-Mixed Text using Transformers
Omar Sharif | Eftekhar Hossain | Mohammed Moshiul Hoque
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages

The increasing accessibility of the internet facilitated social media usage and encouraged individuals to express their opinions liberally. Nevertheless, it also creates a place for content polluters to disseminate offensive posts or contents. Most of such offensive posts are written in a cross-lingual manner and can easily evade the online surveillance systems. This paper presents an automated system that can identify offensive text from multilingual code-mixed data. In the task, datasets provided in three languages including Tamil, Malayalam and Kannada code-mixed with English where participants are asked to implement separate models for each language. To accomplish the tasks, we employed two machine learning techniques (LR, SVM), three deep learning (LSTM, LSTM+Attention) techniques and three transformers (m-BERT, Indic-BERT, XLM-R) based methods. Results show that XLM-R outperforms other techniques in Tamil and Malayalam languages while m-BERT achieves the highest score in the Kannada language. The proposed models gained weighted f_1 score of 0.76 (for Tamil), 0.93 (for Malayalam ), and 0.71 (for Kannada) with a rank of 3rd, 5th and 4th respectively.

pdf bib
NLP-CUET@DravidianLangTech-EACL2021: Investigating Visual and Textual Features to Identify Trolls from Multimodal Social Media Memes
Eftekhar Hossain | Omar Sharif | Mohammed Moshiul Hoque
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages

In the past few years, the meme has become a new way of communication on the Internet. As memes are in images forms with embedded text, it can quickly spread hate, offence and violence. Classifying memes are very challenging because of their multimodal nature and region-specific interpretation. A shared task is organized to develop models that can identify trolls from multimodal social media memes. This work presents a computational model that we developed as part of our participation in the task. Training data comes in two forms: an image with embedded Tamil code-mixed text and an associated caption. We investigated the visual and textual features using CNN, VGG16, Inception, m-BERT, XLM-R, XLNet algorithms. Multimodal features are extracted by combining image (CNN, ResNet50, Inception) and text (Bi-LSTM) features via early fusion approach. Results indicate that the textual approach with XLNet achieved the highest weighted f_1-score of 0.58, which enable our model to secure 3rd rank in this task.

pdf bib
NLP-CUET@LT-EDI-EACL2021: Multilingual Code-Mixed Hope Speech Detection using Cross-lingual Representation Learner
Eftekhar Hossain | Omar Sharif | Mohammed Moshiul Hoque
Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion

In recent years, several systems have been developed to regulate the spread of negativity and eliminate aggressive, offensive or abusive contents from the online platforms. Nevertheless, a limited number of researches carried out to identify positive, encouraging and supportive contents. In this work, our goal is to identify whether a social media post/comment contains hope speech or not. We propose three distinct models to identify hope speech in English, Tamil and Malayalam language to serve this purpose. To attain this goal, we employed various machine learning (SVM, LR, ensemble), deep learning (CNN+BiLSTM) and transformer (m-BERT, Indic-BERT, XLNet, XLM-R) based methods. Results indicate that XLM-R outdoes all other techniques by gaining a weighted f_1-score of 0.93, 0.60 and 0.85 respectively for English, Tamil and Malayalam language. Our team has achieved 1st, 2nd and 1st rank in these three tasks respectively.

2020

pdf bib
TechTexC: Classification of Technical Texts using Convolution and Bidirectional Long Short Term Memory Network
Omar Sharif | Eftekhar Hossain | Mohammed Moshiul Hoque
Proceedings of the 17th International Conference on Natural Language Processing (ICON): TechDOfication 2020 Shared Task

This paper illustrates the details description of technical text classification system and its results that developed as a part of participation in the shared task TechDofication 2020. The shared task consists of two sub-tasks: (i) first task identify the coarse-grained technical domain of given text in a specified language and (ii) the second task classify a text of computer science domain into fine-grained sub-domains. A classification system (called ‘TechTexC’) is developed to perform the classification task using three techniques: convolution neural network (CNN), bidirectional long short term memory (BiLSTM) network, and combined CNN with BiLSTM. Results show that CNN with BiLSTM model outperforms the other techniques concerning task-1 of sub-tasks (a, b, c and g) and task-2a. This combined model obtained f1 scores of 82.63 (sub-task a), 81.95 (sub-task b), 82.39 (sub-task c), 84.37 (sub-task g), and 67.44 (task-2a) on the development dataset. Moreover, in the case of test set, the combined CNN with BiLSTM approach achieved that higher accuracy for the subtasks 1a (70.76%), 1b (79.97%), 1c (65.45%), 1g (49.23%) and 2a (70.14%).