The American Palimpsest: Quantifying South Asian English Dialect Erasure in LLMs

Soumedhik Bharati, Shibam Mandal, Swarup Kr Ghosh, Sayani Mondal


Abstract
Large Language Models are increasingly deployed as writing assistants for usersin the Global South, yet rewriting prompts can suppress institutionalizedpostcolonial varieties. We quantify South Asian English (SAsE) dialect erasure ina state-of-the-art open-weight model using a 500-sentence diagnostic benchmark(320 lexical and 180 syntactic markers). On Llama 3.3 70B, standard grammarcorrection retains only 26.0% of markers (lexical 31.2%; syntactic 16.7%),while formalization is more destructive (14.0% overall retention). For lexicalitems, we observe Americanization in 56.2% (correction) and 59.4%(formalization) of cases, typically via Standard American paraphrases. A simpledialect-aware prompt raises retention to 92.0% and reduces lexicalAmericanization to 6.2%, although some function-word phenomena remain resistant. A stress test shows evenstronger suppression (6.7% retention). We position dialect erasure withinrepresentational-harm and cultural-competence frameworks, and provide areplicable protocol for auditing writing-assistance systems.
Anthology ID:
2026.c3nlp-1.8
Volume:
Proceedings of the 4th Workshop on Cross-Cultural Considerations in NLP (C3NLP 2026)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Vinodkumar Prabhakaran, Sunipa Dev, Luciana Benotti, Daniel Hershcovich, Yong Cao, Li Zhou, BOlei Ma, Ife Adebara
Venues:
C3NLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
108–118
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.c3nlp-1.8/
DOI:
Bibkey:
Cite (ACL):
Soumedhik Bharati, Shibam Mandal, Swarup Kr Ghosh, and Sayani Mondal. 2026. The American Palimpsest: Quantifying South Asian English Dialect Erasure in LLMs. In Proceedings of the 4th Workshop on Cross-Cultural Considerations in NLP (C3NLP 2026), pages 108–118, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
The American Palimpsest: Quantifying South Asian English Dialect Erasure in LLMs (Bharati et al., C3NLP 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.c3nlp-1.8.pdf