Ilan Meyrowitsch


2022

In this paper, we describe our submissions to SemEval-2022 subtask 4-A - “Patronizing and Condescending Language Detection: Binary Classification”. We developed different models for this subtask. We applied 11 supervised machine learning methods and 9 preprocessing methods. Our best submission was a model we built with BertForSequenceClassification. Our experiments indicate that pre-processing stage is a must for a successful model. The dataset for Subtask 1 is highly imbalanced dataset. The f1-scores on the oversampled imbalanced training dataset were higher the results on the original training dataset.
In this paper, we describe our submissions to SemEval-2022 contest. We tackled subtask 6-A - “iSarcasmEval: Intended Sarcasm Detection In English and Arabic – Binary Classification”. We developed different models for two languages: English and Arabic. We applied 4 supervised machine learning methods, 6 preprocessing methods for English and 3 for Arabic, and 3 oversampling methods. Our best submitted model for the English test dataset was a SVC model that balanced the dataset using SMOTE and removed stop words. For the Arabic test dataset our best submitted model was a SVC model that preprocessed removed longation.