Bias and Reliability in AI Safety Assessment: Multi-Facet Rasch Analysis of Human Moderators
Chunling Niu, Kelly Bradley, Biao Ma, Brian Waltman, Loren Cossette, Rui Jin
Abstract
Using Multi-Facet Rasch Modeling on 36,400 safety ratings of AI-generated conversations, we reveal significant racial disparities (Asian 39.1%, White 28.7% detection rates) and content-specific bias patterns. Simulations show that diverse teams of 8-10 members achieve 70%+ reliability versus 62% for smaller homogeneous teams, providing evidence-based guidelines for AI-generated content moderation.- Anthology ID:
- 2025.aimecon-main.42
- Volume:
- Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Full Papers
- Month:
- October
- Year:
- 2025
- Address:
- Wyndham Grand Pittsburgh, Downtown, Pittsburgh, Pennsylvania, United States
- Editors:
- Joshua Wilson, Christopher Ormerod, Magdalen Beiting Parrish
- Venue:
- AIME-Con
- SIG:
- Publisher:
- National Council on Measurement in Education (NCME)
- Note:
- Pages:
- 393–397
- Language:
- URL:
- https://preview.aclanthology.org/ingest-emnlp/2025.aimecon-main.42/
- DOI:
- Cite (ACL):
- Chunling Niu, Kelly Bradley, Biao Ma, Brian Waltman, Loren Cossette, and Rui Jin. 2025. Bias and Reliability in AI Safety Assessment: Multi-Facet Rasch Analysis of Human Moderators. In Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Full Papers, pages 393–397, Wyndham Grand Pittsburgh, Downtown, Pittsburgh, Pennsylvania, United States. National Council on Measurement in Education (NCME).
- Cite (Informal):
- Bias and Reliability in AI Safety Assessment: Multi-Facet Rasch Analysis of Human Moderators (Niu et al., AIME-Con 2025)
- PDF:
- https://preview.aclanthology.org/ingest-emnlp/2025.aimecon-main.42.pdf