Who Speaks Matters: Analysing the Influence of the Speaker’s Linguistic Identity on Hate Classification

Ananya Malik, Kartik Sharma, Shaily Bhatt, Lynnette Hui Xian Ng


Abstract
Large Language Models (LLMs) offer a lucrative promise for scalable content moderation, including hate speech detection. However, they are also known to be brittle and biased against marginalised communities and dialects. This requires their applications to high-stakes tasks like hate speech detection to be critically scrutinized. In this work, we investigate the robustness of hate speech classification using LLMs particularly when explicit and implicit markers of the speaker’s ethnicity are injected into the input. For explicit markers, we inject a phrase that mentions the speaker’s linguistic identity. For the implicit markers, we inject dialectal features. By analysing how frequently model outputs flip in the presence of these markers, we reveal varying degrees of brittleness across 3 LLMs and 1 LM and 5 linguistic identities. We find that the presence of implicit dialect markers in inputs causes model outputs to flip more than the presence of explicit markers. Further, the percentage of flips varies across ethnicities. Finally, we find that larger models are more robust. Our findings indicate the need for exercising caution in deploying LLMs for high-stakes tasks like hate speech detection.
Anthology ID:
2025.findings-emnlp.1357
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
24927–24937
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1357/
DOI:
10.18653/v1/2025.findings-emnlp.1357
Bibkey:
Cite (ACL):
Ananya Malik, Kartik Sharma, Shaily Bhatt, and Lynnette Hui Xian Ng. 2025. Who Speaks Matters: Analysing the Influence of the Speaker’s Linguistic Identity on Hate Classification. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 24927–24937, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Who Speaks Matters: Analysing the Influence of the Speaker’s Linguistic Identity on Hate Classification (Malik et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1357.pdf
Checklist:
 2025.findings-emnlp.1357.checklist.pdf