JL-Hate: An Annotated Dataset for Joint Learning of Hate Speech and Target Detection

Kaan Büyükdemirci; Izzet Emre Kucukkaya; Eren Ölmez; Cagri Toraman

JL-Hate: An Annotated Dataset for Joint Learning of Hate Speech and Target Detection

Kaan Büyükdemirci, Izzet Emre Kucukkaya, Eren Ölmez, Cagri Toraman

Abstract

The detection of hate speech is a subject extensively explored by researchers, and machine learning algorithms play a crucial role in this domain. The existing resources mostly focus on text sequence classification for the task of hate speech detection. However, the target of hateful content is another dimension that has not been studied in details due to the lack of data resources. In this study, we address this gap by introducing a novel tweet dataset for the task of joint learning of hate speech detection and target detection, called JL-Hate, for the tasks of sequential text classification and token classification, respectively. The JL-Hate dataset consists of 1,530 tweets divided equally in English and Turkish languages. Leveraging this dataset, we conduct a series of benchmark experiments. We utilize a joint learning model to concurrently perform sequence and token classification tasks on our data. Our experimental results demonstrate consistent performance with the prevalent studies, both in sequence and token classification tasks.

Anthology ID:: 2024.lrec-main.834
Volume:: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:: LREC | COLING
SIG:
Publisher:: ELRA and ICCL
Note:
Pages:: 9543–9553
Language:
URL:: https://aclanthology.org/2024.lrec-main.834
DOI:
Bibkey:
Cite (ACL):: Kaan Büyükdemirci, Izzet Emre Kucukkaya, Eren Ölmez, and Cagri Toraman. 2024. JL-Hate: An Annotated Dataset for Joint Learning of Hate Speech and Target Detection. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 9543–9553, Torino, Italia. ELRA and ICCL.
Cite (Informal):: JL-Hate: An Annotated Dataset for Joint Learning of Hate Speech and Target Detection (Büyükdemirci et al., LREC-COLING 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2024.lrec-main.834.pdf

PDF Search