Matan Fchima


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2022

pdf bib
JCT at SemEval-2022 Task 4-A: Patronism Detection in Posts Written in English using Preprocessing Methods and various Machine Leaerning Methods
Yaakov HaCohen-Kerner | Ilan Meyrowitsch | Matan Fchima
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

In this paper, we describe our submissions to SemEval-2022 subtask 4-A - “Patronizing and Condescending Language Detection: Binary Classification”. We developed different models for this subtask. We applied 11 supervised machine learning methods and 9 preprocessing methods. Our best submission was a model we built with BertForSequenceClassification. Our experiments indicate that pre-processing stage is a must for a successful model. The dataset for Subtask 1 is highly imbalanced dataset. The f1-scores on the oversampled imbalanced training dataset were higher the results on the original training dataset.

pdf bib
JCT at SemEval-2022 Task 6-A: Sarcasm Detection in Tweets Written in English and Arabic using Preprocessing Methods and Word N-grams
Yaakov HaCohen-Kerner | Matan Fchima | Ilan Meyrowitsch
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

In this paper, we describe our submissions to SemEval-2022 contest. We tackled subtask 6-A - “iSarcasmEval: Intended Sarcasm Detection In English and Arabic – Binary Classification”. We developed different models for two languages: English and Arabic. We applied 4 supervised machine learning methods, 6 preprocessing methods for English and 3 for Arabic, and 3 oversampling methods. Our best submitted model for the English test dataset was a SVC model that balanced the dataset using SMOTE and removed stop words. For the Arabic test dataset our best submitted model was a SVC model that preprocessed removed longation.