DeTox: A Comprehensive Dataset for German Offensive Language and Conversation Analysis
Christoph Demus, Jonas Pitz, Mina Schütz, Nadine Probol, Melanie Siegel, Dirk Labudde
Abstract
In this work, we present a new publicly available offensive language dataset of 10.278 German social media comments collected in the first half of 2021 that were annotated by in total six annotators. With twelve different annotation categories, it is far more comprehensive than other datasets, and goes beyond just hate speech detection. The labels aim in particular also at toxicity, criminal relevance and discrimination types of comments. Furthermore, about half of the comments are from coherent parts of conversations, which opens the possibility to consider the comments’ contexts and do conversation analyses in order to research the contagion of offensive language in conversations.- Anthology ID:
- 2022.woah-1.14
- Volume:
- Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)
- Month:
- July
- Year:
- 2022
- Address:
- Seattle, Washington (Hybrid)
- Editors:
- Kanika Narang, Aida Mostafazadeh Davani, Lambert Mathias, Bertie Vidgen, Zeerak Talat
- Venue:
- WOAH
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 143–153
- Language:
- URL:
- https://aclanthology.org/2022.woah-1.14
- DOI:
- 10.18653/v1/2022.woah-1.14
- Cite (ACL):
- Christoph Demus, Jonas Pitz, Mina Schütz, Nadine Probol, Melanie Siegel, and Dirk Labudde. 2022. DeTox: A Comprehensive Dataset for German Offensive Language and Conversation Analysis. In Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH), pages 143–153, Seattle, Washington (Hybrid). Association for Computational Linguistics.
- Cite (Informal):
- DeTox: A Comprehensive Dataset for German Offensive Language and Conversation Analysis (Demus et al., WOAH 2022)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2022.woah-1.14.pdf
- Code
- hdasprachtechnologie/detox