Francisco J. Ribadas Pena


2019

pdf
CiTIUS-COLE at SemEval-2019 Task 5: Combining Linguistic Features to Identify Hate Speech Against Immigrants and Women on Multilingual Tweets
Sattam Almatarneh | Pablo Gamallo | Francisco J. Ribadas Pena
Proceedings of the 13th International Workshop on Semantic Evaluation

This article describes the strategy submitted by the CiTIUS-COLE team to SemEval 2019 Task 5, a task which consists of binary classi- fication where the system predicting whether a tweet in English or in Spanish is hateful against women or immigrants or not. The proposed strategy relies on combining linguis- tic features to improve the classifier’s perfor- mance. More precisely, the method combines textual and lexical features, embedding words with the bag of words in Term Frequency- Inverse Document Frequency (TF-IDF) repre- sentation. The system performance reaches about 81% F1 when it is applied to the training dataset, but its F1 drops to 36% on the official test dataset for the English and 64% for the Spanish language concerning the hate speech class