2022
pdf
abs
High Tech team at SemEval-2022 Task 6: Intended Sarcasm Detection for Arabic texts
Hamza Alami
|
Abdessamad Benlahbib
|
Ahmed Alami
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
This paper presents our proposed methods for the iSarcasmEval shared task. The shared task consists of three different subtasks. We participate in both subtask A and subtask C. The purpose of subtask A was to predict if a text is sarcastic while the aim of subtask C is to determine which text is sarcastic given a sarcastic text and its non-sarcastic rephrase. Both of the developed solutions used BERT pre-trained models. The proposed models are optimized on simple objectives and are easy to grasp. However, despite their simplicity, our methods ranked 4 and 2 in iSarcasmEval subtask A and subtask C for Arabic texts.
pdf
abs
LISACTeam at SemEval-2022 Task 6: A Transformer based Approach for Intended Sarcasm Detection in English Tweets
Abdessamad Benlahbib
|
Hamza Alami
|
Ahmed Alami
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
In this paper, we present our system and findings for SemEval-2022 Task 6 - iSarcasmEval: Intended Sarcasm Detection in English. The main objective of this task was to identify sarcastic tweets. This task was challenging mainly due to (1) the small training dataset that contains only 3468 tweets and (2) the imbalanced class distribution (25% sarcastic and 75% non-sarcastic). Our submitted model (ranked eighth on Sub-Task A and fifth on Sub-Task C) consists of a Transformer-based approach (BERTweet model).
2021
pdf
abs
LISAC FSDM USMBA at SemEval-2021 Task 5: Tackling Toxic Spans Detection Challenge with Supervised SpanBERT-based Model and Unsupervised LIME-based Model
Abdessamad Benlahbib
|
Ahmed Alami
|
Hamza Alami
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)
Toxic spans detection is an emerging challenge that aims to find toxic spans within a toxic text. In this paper, we describe our solutions to tackle toxic spans detection. The first solution, which follows a supervised approach, is based on SpanBERT model. This latter is intended to better embed and predict spans of text. The second solution, which adopts an unsupervised approach, combines linear support vector machine with the Local Interpretable Model-Agnostic Explanations (LIME). This last is used to interpret predictions of learning-based models. Our supervised model outperformed the unsupervised model and achieved the f-score of 67,84% (ranked 22/85) in Task 5 at SemEval-2021: Toxic Spans Detection.
2020
pdf
abs
Weighted combination of BERT and N-GRAM features for Nuanced Arabic Dialect Identification
Abdellah El Mekki
|
Ahmed Alami
|
Hamza Alami
|
Ahmed Khoumsi
|
Ismail Berrada
Proceedings of the Fifth Arabic Natural Language Processing Workshop
Around the Arab world, different Arabic dialects are spoken by more than 300M persons, and are increasingly popular in social media texts. However, Arabic dialects are considered to be low-resource languages, limiting the development of machine-learning based systems for these dialects. In this paper, we investigate the Arabic dialect identification task, from two perspectives: country-level dialect identification from 21 Arab countries, and province-level dialect identification from 100 provinces. We introduce an unified pipeline of state-of-the-art models, that can handle the two subtasks. Our experimental studies applied to the NADI shared task, show promising results both at the country-level (F1-score of 25.99%) and the province-level (F1-score of 6.39%), and thus allow us to be ranked 2nd for the country-level subtask, and 1st in the province-level subtask.
pdf
abs
LISAC FSDM-USMBA Team at SemEval-2020 Task 12: Overcoming AraBERT’s pretrain-finetune discrepancy for Arabic offensive language identification
Hamza Alami
|
Said Ouatik El Alaoui
|
Abdessamad Benlahbib
|
Noureddine En-nahnahi
Proceedings of the Fourteenth Workshop on Semantic Evaluation
AraBERT is an Arabic version of the state-of-the-art Bidirectional Encoder Representations from Transformers (BERT) model. The latter has achieved good performance in a variety of Natural Language Processing (NLP) tasks. In this paper, we propose an effective AraBERT embeddings-based method for dealing with offensive Arabic language in Twitter. First, we pre-process tweets by handling emojis and including their Arabic meanings. To overcome the pretrain-finetune discrepancy, we substitute each detected emojis by the special token [MASK] into both fine tuning and inference phases. Then, we represent tweets tokens by applying AraBERT model. Finally, we feed the tweet representation into a sigmoid function to decide whether a tweet is offensive or not. The proposed method achieved the best results on OffensEval 2020: Arabic task and reached a macro F1 score equal to 90.17%.