Annabella Sakunkoo
2026
Through the Looking Glass of Multilingual AI: Contrasting Language- and Name Script-Dependent Ethnic Hierarchies in GPT and DeepSeek
Annabella Sakunkoo | Jonathan Sakunkoo
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Annabella Sakunkoo | Jonathan Sakunkoo
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Large language models (LLMs) are increasingly used as evaluative tools across languages, yet bias research remains overwhelmingly Anglocentric, with most studies conducted in English using Latin-script names. It remains unclear whether bias patterns generalize across linguistic contexts. We investigate this question and introduce the stereotype perceptual map, a framework for analyzing how ethnic groups are positioned along evaluative dimensions.Using 900,000 model responses over 45,000 name variations spanning 9 ethnicities, we evaluate model behavior across prompt languages (English, Chinese, Thai), writing scripts (Latin, Chinese, Thai), evaluative domains (competence, warmth), and models (GPT, DeepSeek). We find that ethnic bias hierarchies are jointly shaped by local linguistic context and model origin and differ substantially between Western-centric and Sinocentric models.DeepSeek exhibits highly stable rankings across conditions in math competence judgments, consistently placing Chinese at the top, followed by Russian, and White, Hispanic, and Black names at the bottom. GPT, by contrast, shows strong script-dependent reordering: Latin-script conditions form one stable cluster, while native-script conditions form another, with substantially lower cross-cluster correlations. We term this script-gated bias: transliterating the same names into a non-Latin script can activate a different evaluative frame and produce rankings that are sometimes inversely correlated with Latin-script results. Warmth evaluations are less stable than competence across both models.Our findings demonstrate that multilingual bias cannot be characterized through single-language, single-script audits. For multilingual users, code-switching between languages can toggle between different bias regimes. Fairness evaluations for multilingual LLMs must therefore account for deployment language, writing system, and model origin to capture the full range of potentially harmful bias these systems exhibit.
2025
Mind the Gap: Computational Quality Assurance of Crowd-Sourced Linguistic Knowledge on Latin and Italian Morphological Gaps
Jonathan Sakunkoo | Annabella Sakunkoo
Proceedings of the Society for Computation in Linguistics 2025
Jonathan Sakunkoo | Annabella Sakunkoo
Proceedings of the Society for Computation in Linguistics 2025
Lingdex.org:Leveraging LLMs to Structure and Explore Linguistic Olympiad Puzzles for Learning and Teaching Linguistics
Jonathan Sakunkoo | Annabella Sakunkoo
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Jonathan Sakunkoo | Annabella Sakunkoo
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Linguistics Olympiad puzzles provide a valuable but underutilized resource for teaching linguistic reasoning, typology, and cross-cultural understanding. Many of these puzzles feature endangered and low-resource languages and thus offer a rare opportunity to integrate linguistic diversity into education at a time when over 40% of the world’s languages face extinction. This paper presents Lingdex, a novel web-based platform that leverages large language models (LLMs) to classify, organize, and enliven Linguistics Olympiad problems across various linguistic categories such as syntax, morphology, semantics, phonology, and language families. By applying NLP techniques to the multilingual and multicultural corpora of linguistics puzzles drawn from international and national Olympiads, Lingdex supports language and linguistics education, problem-based learning, and curriculum development. The visual, interactive platform also includes problems based on endangered and rare languages to raise awareness and interest in linguistic diversity. We present results from a user study that shows increased learner interest and appreciation for global linguistic richness.
Name of Thrones: How Do LLMs Rank Student Names in Status Hierarchies Based on Race and Gender?
Annabella Sakunkoo | Jonathan Sakunkoo
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)
Annabella Sakunkoo | Jonathan Sakunkoo
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)
Across cultures, names tell a lot about their bearers as they carry deep personal, historical, and cultural significance. Names have also been found to serve as powerful signals of gender, race, and status in the social hierarchy–a pecking order in which individual positions shape others’ expectations on their perceived competence and worth (Podolny, 2005). With the widespread adoption of Large Language Models (LLMs) in education and given that names are often an input for LLMs, it is crucial to evaluate whether LLMs may sort students into status positions based on first and last names and, if so, whether it is in an unfair, biased fashion. While prior work has primarily investigated biases in first names, little attention has been paid to last names and even less to the combined effects of first and last names. In this study, we conduct a large-scale analysis with bootstrap standard errors of 45,000 name variations across 5 ethnicities to examine how AI-generated responses exhibit systemic name biases. Our study investigates three key characteristics of inequality and finds that LLMs reflect, construct, and reinforce status hierarchies based on names that signal gender and ethnicity as they encode differential expectations of competence, leadership, and economic potential. Contrary to the common assumption that AI tends to favor Whites, we show that East and, in some contexts, South Asian names receive higher rankings. We also disaggregate Asians, a population projected to be the largest immigrant group in the U.S. by 2055. Our results challenge the monolithic Asian model minority assumption, illustrating a more complex and stratified model of bias. Additionally, spanning cultural categories by adopting Western first names improves AI-perceived status for East and Southeast Asian students, particularly for girls. Our findings underscore the importance of intersectional and more nuanced understandings of race, gender, and mixed identities in the evaluation of LLMs, rather than relying on broad, monolithic, and mutually exclusive categories. By examining LLM bias and discrimination in our multicultural contexts, our study illustrates potential harms of using LLMs in education as they do not merely reflect implicit biases but also actively construct new social hierarchies that can unfairly shape long-term life trajectories. An LLM that systematically assigns lower grades or subtly less favorable evaluations to students with certain name signals reinforces a tiered system of privilege and opportunity. Some groups may face structural disadvantages, while others encounter undue pressure from inflated expectations.
Lost and Found: Computational Quality Assurance of Crowdsourced Knowledge on Morphological Defectivity in Wiktionary
Jonathan Sakunkoo | Annabella Sakunkoo
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
Jonathan Sakunkoo | Annabella Sakunkoo
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
Morphological defectivity is an intriguing and understudied phenomenon in linguistics. Addressing defectivity, where expected inflectional forms are absent, is essential for improving the accuracy of NLP tools in morphologically rich languages. However, traditional linguistic resources often lack coverage of morphological gaps as such knowledge requires significant human expertise and effort to document and verify. For scarce linguistic phenomena in under-explored languages, Wikipedia and Wiktionary often serve as among the few accessible resources. Despite their extensive reach, their reliability has been a subject of controversy. This study customizes a novel neural morphological analyzer to annotate Latin and Italian corpora. Using the massive annotated data, crowd-sourced lists of defective verbs compiled from Wiktionary are validated computationally. Our results indicate that while Wiktionary provides a highly reliable account of Italian morphological gaps, 7% of Latin lemmata listed as defective show strong corpus evidence of being non-defective. This discrepancy highlights potential limitations of crowd-sourced wikis as definitive sources of linguistic knowledge, particularly for less-studied phenomena and languages, despite their value as resources for rare linguistic features. By providing scalable tools and methods for quality assurance of crowd-sourced data, this work advances computational morphology and expands linguistic knowledge of defectivity in non-English, morphologically rich languages.