On the Role of Speech Data in Reducing Toxicity Detection Bias
Samuel Bell, Mariano Coria Meglioli, Megan Richards, Eduardo Sánchez, Christophe Ropers, Skyler Wang, Adina Williams, Levent Sagun, Marta R. Costa-jussà
Abstract
Text toxicity detection systems exhibit significant biases, producing disproportionate rates of false positives on samples mentioning demographic groups. But what about toxicity detection in speech? To investigate the extent to which text-based biases are mitigated by speech-based systems, we produce a set of high-quality group annotations for the multilingual MuTOX dataset, and then leverage these annotations to systematically compare speech- and text-based toxicity classifiers. Our findings indicate that access to speech data during inference supports reduced bias against group mentions, particularly for ambiguous and disagreement-inducing samples. Our results also suggest that improving classifiers, rather than transcription pipelines, is more helpful for reducing group bias. We publicly release our annotations and provide recommendations for future toxicity dataset construction.- Anthology ID:
- 2025.naacl-long.67
- Volume:
- Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
- Month:
- April
- Year:
- 2025
- Address:
- Albuquerque, New Mexico
- Editors:
- Luis Chiruzzo, Alan Ritter, Lu Wang
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1454–1468
- Language:
- URL:
- https://preview.aclanthology.org/landing_page/2025.naacl-long.67/
- DOI:
- Cite (ACL):
- Samuel Bell, Mariano Coria Meglioli, Megan Richards, Eduardo Sánchez, Christophe Ropers, Skyler Wang, Adina Williams, Levent Sagun, and Marta R. Costa-jussà. 2025. On the Role of Speech Data in Reducing Toxicity Detection Bias. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 1454–1468, Albuquerque, New Mexico. Association for Computational Linguistics.
- Cite (Informal):
- On the Role of Speech Data in Reducing Toxicity Detection Bias (Bell et al., NAACL 2025)
- PDF:
- https://preview.aclanthology.org/landing_page/2025.naacl-long.67.pdf