DetectiveReDASers at HSD-2Lang 2024: A New Pooling Strategy with Cross-lingual Augmentation and Ensembling for Hate Speech Detection in Low-resource Languages

Fatima Zahra Qachfar; Bryan Tuck; Rakesh Verma

DetectiveReDASers at HSD-2Lang 2024: A New Pooling Strategy with Cross-lingual Augmentation and Ensembling for Hate Speech Detection in Low-resource Languages

Fatima Zahra Qachfar, Bryan Tuck, Rakesh Verma

Abstract

This paper addresses hate speech detection in Turkish and Arabic tweets, contributing to the HSD-2Lang Shared Task. We propose a specialized pooling strategy within a soft-voting ensemble framework to improve classification in Turkish and Arabic language models. Our approach also includes expanding the training sets through cross-lingual translation, introducing a broader spectrum of hate speech examples. Our method attains F1-Macro scores of 0.6964 for Turkish (Subtask A) and 0.7123 for Arabic (Subtask B). While achieving these results, we also consider the computational overhead, striking a balance between the effectiveness of our unique pooling strategy, data augmentation, and soft-voting ensemble. This approach advances the practical application of language models in low-resource languages for hate speech detection.

Anthology ID:: 2024.case-1.28
Volume:: Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024)
Month:: March
Year:: 2024
Address:: St. Julians, Malta
Editors:: Ali Hürriyetoğlu, Hristo Tanev, Surendrabikram Thapa, Gökçe Uludoğan
Venues:: CASE | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 199–204
Language:
URL:: https://preview.aclanthology.org/add-emnlp-2024-awards/2024.case-1.28/
DOI:
Bibkey:
Cite (ACL):: Fatima Zahra Qachfar, Bryan Tuck, and Rakesh Verma. 2024. DetectiveReDASers at HSD-2Lang 2024: A New Pooling Strategy with Cross-lingual Augmentation and Ensembling for Hate Speech Detection in Low-resource Languages. In Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024), pages 199–204, St. Julians, Malta. Association for Computational Linguistics.
Cite (Informal):: DetectiveReDASers at HSD-2Lang 2024: A New Pooling Strategy with Cross-lingual Augmentation and Ensembling for Hate Speech Detection in Low-resource Languages (Qachfar et al., CASE 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/add-emnlp-2024-awards/2024.case-1.28.pdf
Supplementarymaterial:: 2024.case-1.28.SupplementaryMaterial.txt
Video:: https://preview.aclanthology.org/add-emnlp-2024-awards/2024.case-1.28.mp4

PDF Cite Search Supplementarymaterial Video Fix data