Speak No Evil, Just Prompt: Low-resource Multilingual Toxic Speech Detection with Audio Language Model
Mingzi Zuo, Lei Zhang, Hailiang Sun, Shengzhi Huo, Changyu Dong, Xin Wang, Bo Wang, Hao Liu
Abstract
The widespread dissemination of toxic content on online platforms poses a critical threat to user experience. Toxicity detection in speech receives significantly less research attention than its text counterpart. Most existing methods rely on high-resource languages and employ a cascaded pipeline combining automatic speech recognition (ASR) and text classifiers. These designs limit robustness in low-resource languages and discard important acoustic cues. To address the lack of datasets, we construct PolySpeechTox, the first toxicity-annotated speech dataset spanning 53 languages and accent varieties, with a focus on low-resource languages and multiple accents. Based on PolySpeechTox, we conduct the first systematic study of toxic speech detection under low-resource, multilingual, and multi-accent conditions. We propose SoftPrompt-TSD, a prompt-based adaptation framework that leverages a frozen audio language model to perform end-to-end toxicity detection without ASR. The decomposed soft-prompt design balances global task alignment, cross-lingual generalization, and language-specific or accent-specific calibration. On PolySpeechTox, SoftPrompt-TSD achieves a micro-averaged ROC-AUC of 98.07%, mitigating the severe failures observed in baseline methods for several languages. In three generalization experiments, SoftPrompt-TSD demonstrates superior generalization capability and maintains robust performance against distribution shifts.- Anthology ID:
- 2026.findings-acl.439
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 9039–9053
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.439/
- DOI:
- Cite (ACL):
- Mingzi Zuo, Lei Zhang, Hailiang Sun, Shengzhi Huo, Changyu Dong, Xin Wang, Bo Wang, and Hao Liu. 2026. Speak No Evil, Just Prompt: Low-resource Multilingual Toxic Speech Detection with Audio Language Model. In Findings of the Association for Computational Linguistics: ACL 2026, pages 9039–9053, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- Speak No Evil, Just Prompt: Low-resource Multilingual Toxic Speech Detection with Audio Language Model (Zuo et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.439.pdf