Tw-StAR at SemEval-2019 Task 5: N-gram embeddings for Hate Speech Detection in Multilingual Tweets
Hala Mulki, Chedi Bechikh Ali, Hatem Haddad, Ismail Babaoğlu
Abstract
In this paper, we describe our contribution in SemEval-2019: subtask A of task 5 “Multilingual detection of hate speech against immigrants and women in Twitter (HatEval)”. We developed two hate speech detection model variants through Tw-StAR framework. While the first model adopted one-hot encoding ngrams to train an NB classifier, the second generated and learned n-gram embeddings within a feedforward neural network. For both models, specific terms, selected via MWT patterns, were tagged in the input data. With two feature types employed, we could investigate the ability of n-gram embeddings to rival one-hot n-grams. Our results showed that in English, n-gram embeddings outperformed one-hot ngrams. However, representing Spanish tweets by one-hot n-grams yielded a slightly better performance compared to that of n-gram embeddings. The official ranking indicated that Tw-StAR ranked 9th for English and 20th for Spanish.- Anthology ID:
- S19-2090
- Volume:
- Proceedings of the 13th International Workshop on Semantic Evaluation
- Month:
- June
- Year:
- 2019
- Address:
- Minneapolis, Minnesota, USA
- Venue:
- SemEval
- SIG:
- SIGLEX
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 503–507
- Language:
- URL:
- https://aclanthology.org/S19-2090
- DOI:
- 10.18653/v1/S19-2090
- Cite (ACL):
- Hala Mulki, Chedi Bechikh Ali, Hatem Haddad, and Ismail Babaoğlu. 2019. Tw-StAR at SemEval-2019 Task 5: N-gram embeddings for Hate Speech Detection in Multilingual Tweets. In Proceedings of the 13th International Workshop on Semantic Evaluation, pages 503–507, Minneapolis, Minnesota, USA. Association for Computational Linguistics.
- Cite (Informal):
- Tw-StAR at SemEval-2019 Task 5: N-gram embeddings for Hate Speech Detection in Multilingual Tweets (Mulki et al., SemEval 2019)
- PDF:
- https://preview.aclanthology.org/paclic-22-ingestion/S19-2090.pdf