Abstract
This paper describes the GSI-UPM system for SemEval-2019 Task 5, which tackles multilingual detection of hate speech on Twitter. The main contribution of the paper is the use of a method based on word embeddings and semantic similarity combined with traditional paradigms, such as n-grams, TF-IDF and POS. This combination of several features is fine-tuned through ablation tests, demonstrating the usefulness of different features. While our approach outperforms baseline classifiers on different sub-tasks, the best of our submitted runs reached the 5th position on the Spanish sub-task A.- Anthology ID:
- S19-2070
- Volume:
- Proceedings of the 13th International Workshop on Semantic Evaluation
- Month:
- June
- Year:
- 2019
- Address:
- Minneapolis, Minnesota, USA
- Venue:
- SemEval
- SIG:
- SIGLEX
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 396–403
- Language:
- URL:
- https://aclanthology.org/S19-2070
- DOI:
- 10.18653/v1/S19-2070
- Cite (ACL):
- Diego Benito, Oscar Araque, and Carlos A. Iglesias. 2019. GSI-UPM at SemEval-2019 Task 5: Semantic Similarity and Word Embeddings for Multilingual Detection of Hate Speech Against Immigrants and Women on Twitter. In Proceedings of the 13th International Workshop on Semantic Evaluation, pages 396–403, Minneapolis, Minnesota, USA. Association for Computational Linguistics.
- Cite (Informal):
- GSI-UPM at SemEval-2019 Task 5: Semantic Similarity and Word Embeddings for Multilingual Detection of Hate Speech Against Immigrants and Women on Twitter (Benito et al., SemEval 2019)
- PDF:
- https://preview.aclanthology.org/paclic-22-ingestion/S19-2070.pdf