Abstract
We present an extensive evaluation of different fine-tuned models to detect instances of offensive and abusive language in Dutch across three benchmarks: a standard held-out test, a task- agnostic functional benchmark, and a dynamic test set. We also investigate the use of data cartography to identify high quality training data. Our results show a relatively good quality of the manually annotated data used to train the models while highlighting some critical weakness. We have also found a good portability of trained models along the same language phenomena. As for the data cartography, we have found a positive impact only on the functional benchmark and when selecting data per annotated dimension rather than using the entire training material.- Anthology ID:
- 2023.woah-1.7
- Volume:
- The 7th Workshop on Online Abuse and Harms (WOAH)
- Month:
- July
- Year:
- 2023
- Address:
- Toronto, Canada
- Editors:
- Yi-ling Chung, Paul R{\"ottger}, Debora Nozza, Zeerak Talat, Aida Mostafazadeh Davani
- Venue:
- WOAH
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 69–84
- Language:
- URL:
- https://aclanthology.org/2023.woah-1.7
- DOI:
- 10.18653/v1/2023.woah-1.7
- Cite (ACL):
- Tommaso Caselli and Hylke Van Der Veen. 2023. Benchmarking Offensive and Abusive Language in Dutch Tweets. In The 7th Workshop on Online Abuse and Harms (WOAH), pages 69–84, Toronto, Canada. Association for Computational Linguistics.
- Cite (Informal):
- Benchmarking Offensive and Abusive Language in Dutch Tweets (Caselli & Van Der Veen, WOAH 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2023.woah-1.7.pdf