Johnatan E. Bonilla
2026
Beyond Acoustics: Isolating Dialectal and Sociolinguistic Bias in Spanish ASR
Johnatan E. Bonilla
Proceedings of the Seventh Workshop on Natural Language Processing and Computational Social Science
Johnatan E. Bonilla
Proceedings of the Seventh Workshop on Natural Language Processing and Computational Social Science
Large-scale ASR systems such as Whisper achieve competitive aggregate Word Error Rate (WER) on multilingual benchmarks, but this aggregate conceals systematic disparities across speaker populations. We evaluate Whisper large-v3 on 276 recordings from the Corpus Oral y Sonoro del Español Rural (COSER), a dialectological archive of elderly rural speakers across all Spanish provinces. WER is computed separately for Informants and Interviewers within each recording, revealing that mixed-role evaluation underestimates Informant WER in the majority of provinces, with the largest corrections in southern areas.Negative Binomial regression with cluster-robust standar errors shows that Andalusia and Extremadura generate significantly more Informant errors than the Castilian heartland (Andalusia IRR = 1.20, p < 0.001; Extremadura IRR = 1.24, p = 0.020), while no geographic predictor reaches significance for Interviewers sharing the same recording environment. Male Informants generate 12.5% more errors than females after geographic adjustment (p < 0.001), consistent with differential vernacular retention in traditional rural communities. The geographic pattern aligns with established dialectological classifications of Peninsular Spanish. These results demonstrate that role-disaggregated evaluation is a necessary methodological prerequisite for fairness audits of ASR systems applied to sociolinguistically diverse corpora: aggregate benchmarks systematically suppress disparities that are borne disproportionately by the most underrepresented speaker populations, and their use in isolation constitutes both an allocative harm and a measurement failure
LLM-Adapted Colombian Spanish Lexicography: Proficiency Control, Hallucination, and Cultural Distortion
Johnatan E. Bonilla
Proceedings of the 4th Workshop on Cross-Cultural Considerations in NLP (C3NLP 2026)
Johnatan E. Bonilla
Proceedings of the 4th Workshop on Cross-Cultural Considerations in NLP (C3NLP 2026)
We evaluate whether open-source LLMs can produce proficiency-graded English adaptations of entries from the Diccionario de colombianismos (DiCol), a Colombian Spanish lexicographic resource used in language teaching. Three 7–8B instruction-tuned models—Llama 3.1, Qwen2.5, and Mistral—generate Beginner, Intermediate, and Advanced translations for all 8,252 definitions using structured zero-shot prompts identical across levels except for the target CEFR band. Automated metrics show that Intermediate targeting collapses (73–83% classified as Advanced by vocabulary, 𝜒2 > 705, p < .001) and that Advanced outputs expand 4.9–8.2× relative to the source. Expert annotation of a 360-entry stratified sample (𝜅 = 0.61–0.68) identifies hallucination in 19% of entries (Fleiss’ 𝜅 = 0.77 for cultural preservation categories, 97% unanimity among flagged cases). Hallucination concentrates in the Advanced condition (81%, 𝜒2 = 86.6, p < .001) and is associated with higher expansion (U = 16,662, p < .001, r = 0.68), manifesting primarily as generic elaboration and, in a smaller proportion, as Colombia-stereotyping and pragmatic polarity inversion. We discuss these findings through the lens of (CITATION)’s domestication framework and describe the observed pattern as algorithmic domestication.