Sandra Camille Sandoval

2025

pdf bib abs
My LLM might Mimic AAE - But When Should It?
Sandra Camille Sandoval | Christabel Acquaye | Kwesi Adu Cobbina | Mohammad Nayeem Teli | Hal Daumé Iii
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

We examine the representation of African American English (AAE) in large language models (LLMs), exploring (a) the perceptions Black Americans have of how effective these technologies are at producing authentic AAE, and (b) in what contexts Black Americans find this desirable. Through both a survey of Black Americans (n= 104) and annotation of LLM-produced AAE by Black Americans (n= 228), we find that Black Americans favor choice and autonomy in determining when AAE is appropriate in LLM output. They tend to prefer that LLMs default to communicating in Mainstream U.S. English in formal settings, with greater interest in AAE production in less formal settings. When LLMs were appropriately prompted and provided in context examples, our participants found their outputs to have a level of AAE authenticity on par with transcripts of Black American speech. Select code and data for our project can be found here: https://github.com/smelliecat/AAEMime.git

Co-authors

Venues

naacl1

Fix data