That Ain’t Right: Assessing LLM Performance on QA in African American and West African English Dialects

William Coggins, Jasmine McKenzie, Sangpil Youm, Pradham Mummaleti, Juan Gilbert, Eric Ragan, Bonnie J Dorr


Abstract
As Large Language Models (LLMs) gain mainstream public usage, understanding how users interact with them becomes increasingly important. Limited variety in training data raises concerns about LLM reliability across different language inputs. To explore this, we test several LLMs using functionally equivalent prompts expressed in different English sublanguages. We frame this analysis using Question-Answer (QA) pairs, which allow us to detect and evaluate appropriate and anomalous model behavior. We contribute a cross-LLM testing method and a new QA dataset translated into AAVE and WAPE variants. Early results reveal a notable drop in accuracy for one sublanguage relative to the baseline.
Anthology ID:
2025.winlp-main.21
Volume:
Proceedings of the 9th Widening NLP Workshop
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Chen Zhang, Emily Allaway, Hua Shen, Lesly Miculicich, Yinqiao Li, Meryem M'hamdi, Peerat Limkonchotiwat, Richard He Bai, Santosh T.y.s.s., Sophia Simeng Han, Surendrabikram Thapa, Wiem Ben Rim
Venues:
WiNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
123–129
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.winlp-main.21/
DOI:
Bibkey:
Cite (ACL):
William Coggins, Jasmine McKenzie, Sangpil Youm, Pradham Mummaleti, Juan Gilbert, Eric Ragan, and Bonnie J Dorr. 2025. That Ain’t Right: Assessing LLM Performance on QA in African American and West African English Dialects. In Proceedings of the 9th Widening NLP Workshop, pages 123–129, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
That Ain’t Right: Assessing LLM Performance on QA in African American and West African English Dialects (Coggins et al., WiNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.winlp-main.21.pdf