A Survey of Toxicity Mitigation Strategies for Multilingual Language Models

Soham Dan, Himanshu Beniwal, Thomas Hartvigsen


Abstract
Large language models (LLMs) are transforming natural language processing across diverse linguistic communities. However, they can reproduce and amplify toxic content, including hate speech, harassment, and bias, posing significant risks to multilingual applications. We provide the first comprehensive survey of the many detoxification methods specifically tailored to multilingual LLMs. First, we define toxicity its measurement, then we provide a brief review of monolingual mitigation strategies, including data filtering, style transfer, expert-based logit steering, retrieval augmentation, and alignment with human feedback. We then present an in-depth taxonomy of multilingual approaches spanning (1) training methods, (2) post-hoc editing and decoding strategies, (3) alignment and reinforcement-learning techniques, and (4) data-centric innovations, such as parallel detox corpora and synthetic data generation. Finally, we discuss open challenges in multilingual detoxification, including data scarcity, evaluation inconsistencies, cultural nuances and biases. Overall, we produce a needed overview of the state of multi-lingual toxicity detection and mitigation on which the community can ground to build globally safe and equitable LLMs.
Anthology ID:
2026.findings-acl.1780
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
35761–35774
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1780/
DOI:
Bibkey:
Cite (ACL):
Soham Dan, Himanshu Beniwal, and Thomas Hartvigsen. 2026. A Survey of Toxicity Mitigation Strategies for Multilingual Language Models. In Findings of the Association for Computational Linguistics: ACL 2026, pages 35761–35774, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
A Survey of Toxicity Mitigation Strategies for Multilingual Language Models (Dan et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1780.pdf
Checklist:
 2026.findings-acl.1780.checklist.pdf