GSI-UPM at SemEval-2019 Task 5: Semantic Similarity and Word Embeddings for Multilingual Detection of Hate Speech Against Immigrants and Women on Twitter

Diego Benito, Oscar Araque, Carlos A. Iglesias


Abstract
This paper describes the GSI-UPM system for SemEval-2019 Task 5, which tackles multilingual detection of hate speech on Twitter. The main contribution of the paper is the use of a method based on word embeddings and semantic similarity combined with traditional paradigms, such as n-grams, TF-IDF and POS. This combination of several features is fine-tuned through ablation tests, demonstrating the usefulness of different features. While our approach outperforms baseline classifiers on different sub-tasks, the best of our submitted runs reached the 5th position on the Spanish sub-task A.
Anthology ID:
S19-2070
Volume:
Proceedings of the 13th International Workshop on Semantic Evaluation
Month:
June
Year:
2019
Address:
Minneapolis, Minnesota, USA
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
396–403
Language:
URL:
https://aclanthology.org/S19-2070
DOI:
10.18653/v1/S19-2070
Bibkey:
Cite (ACL):
Diego Benito, Oscar Araque, and Carlos A. Iglesias. 2019. GSI-UPM at SemEval-2019 Task 5: Semantic Similarity and Word Embeddings for Multilingual Detection of Hate Speech Against Immigrants and Women on Twitter. In Proceedings of the 13th International Workshop on Semantic Evaluation, pages 396–403, Minneapolis, Minnesota, USA. Association for Computational Linguistics.
Cite (Informal):
GSI-UPM at SemEval-2019 Task 5: Semantic Similarity and Word Embeddings for Multilingual Detection of Hate Speech Against Immigrants and Women on Twitter (Benito et al., SemEval 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/S19-2070.pdf