Do LLMs model human linguistic variation? A case study in Hindi-English Verb code-mixing
Mukund Choudhary, Madhur Jindal, Gaurja Aeron, Monojit Choudhury
Abstract
Do large language models (LLMs) model linguistic variation? We investigate this question through Hindi-English (Hinglish) verb code-mixing, where speakers can use either a Hindi verb or an English verb with the light verb karna (’do’). Both forms are grammatical, but speakers show unexplained variation in language choice for the verb. We compare human preferences on controlled code-mixed minimal pairs to LLM perplexities spanning families, sizes, and training language compositions. We find that current LLMs do not reliably classify verb language preferences to match native speaker judgments. We also see that with specific supervision, some models do predict human preference to an extent. We release native speaker acceptability judgments on 30 verb pairs, perplexity ratios for 4,279 verb pairs across 7 models, and experimental materials.- Anthology ID:
- 2026.findings-eacl.291
- Volume:
- Findings of the Association for Computational Linguistics: EACL 2026
- Month:
- March
- Year:
- 2026
- Address:
- Rabat, Morocco
- Editors:
- Vera Demberg, Kentaro Inui, Lluís Marquez
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 5491–5509
- Language:
- URL:
- https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.291/
- DOI:
- Cite (ACL):
- Mukund Choudhary, Madhur Jindal, Gaurja Aeron, and Monojit Choudhury. 2026. Do LLMs model human linguistic variation? A case study in Hindi-English Verb code-mixing. In Findings of the Association for Computational Linguistics: EACL 2026, pages 5491–5509, Rabat, Morocco. Association for Computational Linguistics.
- Cite (Informal):
- Do LLMs model human linguistic variation? A case study in Hindi-English Verb code-mixing (Choudhary et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.291.pdf