Detoxify-IT: An Italian Parallel Dataset for Text Detoxification
Viola De Ruvo, Arianna Muti, Daryna Dementieva, Debora Nozza
Abstract
Toxic language online poses growing challenges for content moderation. Detoxification, which rewrites toxic content into neutral form, offers a promising alternative but remains underexplored beyond English. We present Detoxify-IT, the first Italian dataset for this task, featuring toxic comments and their human-written neutral rewrites. Our experiments show that even limited fine-tuning on Italian data leads to notable improvements in content preservation and fluency compared to both multilingual models and LLMs used in zero-shot settings, underlining the need for language-specific resources. This work enables detoxification research in Italian and supports broader efforts toward safer, more inclusive online communication.- Anthology ID:
- 2025.woah-1.24
- Volume:
- Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH)
- Month:
- August
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Agostina Calabrese, Christine de Kock, Debora Nozza, Flor Miriam Plaza-del-Arco, Zeerak Talat, Francielle Vargas
- Venues:
- WOAH | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 267–275
- Language:
- URL:
- https://preview.aclanthology.org/landing_page/2025.woah-1.24/
- DOI:
- Cite (ACL):
- Viola De Ruvo, Arianna Muti, Daryna Dementieva, and Debora Nozza. 2025. Detoxify-IT: An Italian Parallel Dataset for Text Detoxification. In Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH), pages 267–275, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- Detoxify-IT: An Italian Parallel Dataset for Text Detoxification (De Ruvo et al., WOAH 2025)
- PDF:
- https://preview.aclanthology.org/landing_page/2025.woah-1.24.pdf