Toxicity Red-Teaming: Benchmarking LLM Safety in Singapore’s Low-Resource Languages

Yujia Hu; Ming Shan Hee; Preslav Nakov; Roy Ka-Wei Lee

Toxicity Red-Teaming: Benchmarking LLM Safety in Singapore’s Low-Resource Languages

Yujia Hu, Ming Shan Hee, Preslav Nakov, Roy Ka-Wei Lee

Abstract

The advancement of Large Language Models (LLMs) has transformed natural language processing; however, their safety mechanisms remain under-explored in low-resource, multilingual settings. Here, we aim to bridge this gap. In particular, we introduce SGToxicGuard, a novel dataset and evaluation framework for benchmarking LLM safety in Singapore’s diverse linguistic context, including Singlish, Chinese, Malay, and Tamil. SGToxicGuard adopts a red-teaming approach to systematically probe LLM vulnerabilities in three real-world scenarios: conversation, question-answering, and content composition. We conduct extensive experiments with state-of-the-art multilingual LLMs, and the results uncover critical gaps in their safety guardrails. By offering actionable insights into cultural sensitivity and toxicity mitigation, we lay the foundation for safer and more inclusive AI systems in linguistically diverse environments. Disclaimer: This paper contains sensitive content that may be disturbing to some readers.

Anthology ID:: 2025.emnlp-main.612
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12194–12212
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.612/
DOI:
Bibkey:
Cite (ACL):: Yujia Hu, Ming Shan Hee, Preslav Nakov, and Roy Ka-Wei Lee. 2025. Toxicity Red-Teaming: Benchmarking LLM Safety in Singapore’s Low-Resource Languages. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 12194–12212, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Toxicity Red-Teaming: Benchmarking LLM Safety in Singapore’s Low-Resource Languages (Hu et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.612.pdf
Checklist:: 2025.emnlp-main.612.checklist.pdf

PDF Cite Search Checklist Fix data