♪ Something Just Like TRuST ♪ *: Toxicity Recognition of Span and Target

Berk At{\i}l; Namrata Sureddy; Rebecca J. Passonneau

♪ Something Just Like TRuST ♪ *: Toxicity Recognition of Span and Target

Berk At{\i}l, Namrata Sureddy, Rebecca J. Passonneau

Abstract

Toxic language includes content that is offensive, abusive, or that promotes harm. Progress in preventing toxic output from large language models (LLMs) is hampered by inconsistent definitions of toxicity. We introduce TRuST, a large-scale dataset that unifies and expands prior resources through a carefully synthesized definition of toxicity, and corresponding annotation scheme. It consists of ∼300k annotations, with high-quality human annotation on ∼11k. To ensure high-quality, we designed a rigorous, multi-stage human annotation process, and evaluated the diversity of the annotators. Then we benchmarked state-of-the-art LLMs and pre-trained models on three tasks: toxicity detection, identification of the target group, and of toxic words. Our results indicate that fine-tuned PLMs outperform LLMs on the three tasks, and that current reasoning models do not reliably improve performance. TRuST constitutes one of the most comprehensive resources for evaluating and mitigating LLM toxicity, and other research in socially-aware and safer language technologies.

Anthology ID:: 2026.findings-acl.1854
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 37231–37251
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1854/
DOI:
Bibkey:
Cite (ACL):: Berk At{\i}l, Namrata Sureddy, and Rebecca J. Passonneau. 2026. ♪ Something Just Like TRuST ♪ *: Toxicity Recognition of Span and Target. In Findings of the Association for Computational Linguistics: ACL 2026, pages 37231–37251, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: ♪ Something Just Like TRuST ♪ *: Toxicity Recognition of Span and Target (At{\i}l et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1854.pdf
Checklist:: 2026.findings-acl.1854.checklist.pdf

PDF Cite Search Checklist Fix data