JCT at SemEval-2022 Task 6-A: Sarcasm Detection in Tweets Written in English and Arabic using Preprocessing Methods and Word N-grams

Yaakov HaCohen-Kerner, Matan Fchima, Ilan Meyrowitsch


Abstract
In this paper, we describe our submissions to SemEval-2022 contest. We tackled subtask 6-A - “iSarcasmEval: Intended Sarcasm Detection In English and Arabic – Binary Classification”. We developed different models for two languages: English and Arabic. We applied 4 supervised machine learning methods, 6 preprocessing methods for English and 3 for Arabic, and 3 oversampling methods. Our best submitted model for the English test dataset was a SVC model that balanced the dataset using SMOTE and removed stop words. For the Arabic test dataset our best submitted model was a SVC model that preprocessed removed longation.
Anthology ID:
2022.semeval-1.145
Volume:
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
Month:
July
Year:
2022
Address:
Seattle, United States
Venue:
SemEval
SIGs:
SIGLEX | SIGSEM
Publisher:
Association for Computational Linguistics
Note:
Pages:
1031–1038
Language:
URL:
https://aclanthology.org/2022.semeval-1.145
DOI:
10.18653/v1/2022.semeval-1.145
Bibkey:
Cite (ACL):
Yaakov HaCohen-Kerner, Matan Fchima, and Ilan Meyrowitsch. 2022. JCT at SemEval-2022 Task 6-A: Sarcasm Detection in Tweets Written in English and Arabic using Preprocessing Methods and Word N-grams. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), pages 1031–1038, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
JCT at SemEval-2022 Task 6-A: Sarcasm Detection in Tweets Written in English and Arabic using Preprocessing Methods and Word N-grams (HaCohen-Kerner et al., SemEval 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.semeval-1.145.pdf