Robustness and Confounders in the Demographic Alignment of LLMs with Human Perceptions of Offensiveness

Shayan Alipour, Indira Sen, Mattia Samory, Tanu Mitra


Abstract
Despite a growing literature finding that large language models (LLMs) exhibit demographic biases, reports with whom they align best are hard to generalize or even contradictory. In this work, we examine the alignment of LLMs with human annotations in five offensive language datasets, comprising approximately 220K annotations. While demographic traits, particularly race, influence alignment, these effects vary across datasets and are often entangled with other factors. Confounders introduced in the annotation process—such as document difficulty, annotator sensitivity, and within-group agreement—account for more variation in alignment patterns than demographic traits. Alignment increases with annotator sensitivity and group agreement, and decreases with document difficulty. Our results underscore the importance of multi-dataset analyses and confounder-aware methodologies in developing robust measures of demographic bias.
Anthology ID:
2025.findings-acl.1136
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
22025–22047
Language:
URL:
https://preview.aclanthology.org/corrections-2025-08/2025.findings-acl.1136/
DOI:
10.18653/v1/2025.findings-acl.1136
Bibkey:
Cite (ACL):
Shayan Alipour, Indira Sen, Mattia Samory, and Tanu Mitra. 2025. Robustness and Confounders in the Demographic Alignment of LLMs with Human Perceptions of Offensiveness. In Findings of the Association for Computational Linguistics: ACL 2025, pages 22025–22047, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Robustness and Confounders in the Demographic Alignment of LLMs with Human Perceptions of Offensiveness (Alipour et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/corrections-2025-08/2025.findings-acl.1136.pdf