Detoxify-IT: An Italian Parallel Dataset for Text Detoxification

Viola De Ruvo, Arianna Muti, Daryna Dementieva, Debora Nozza


Abstract
Toxic language online poses growing challenges for content moderation. Detoxification, which rewrites toxic content into neutral form, offers a promising alternative but remains underexplored beyond English. We present Detoxify-IT, the first Italian dataset for this task, featuring toxic comments and their human-written neutral rewrites. Our experiments show that even limited fine-tuning on Italian data leads to notable improvements in content preservation and fluency compared to both multilingual models and LLMs used in zero-shot settings, underlining the need for language-specific resources. This work enables detoxification research in Italian and supports broader efforts toward safer, more inclusive online communication.
Anthology ID:
2025.woah-1.24
Volume:
Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH)
Month:
August
Year:
2025
Address:
Vienna, Austria
Editors:
Agostina Calabrese, Christine de Kock, Debora Nozza, Flor Miriam Plaza-del-Arco, Zeerak Talat, Francielle Vargas
Venues:
WOAH | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
267–275
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.woah-1.24/
DOI:
Bibkey:
Cite (ACL):
Viola De Ruvo, Arianna Muti, Daryna Dementieva, and Debora Nozza. 2025. Detoxify-IT: An Italian Parallel Dataset for Text Detoxification. In Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH), pages 267–275, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Detoxify-IT: An Italian Parallel Dataset for Text Detoxification (De Ruvo et al., WOAH 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.woah-1.24.pdf
Supplementarymaterial:
 2025.woah-1.24.SupplementaryMaterial.zip