Annabella Sakunkoo


2025

pdf bib
Lost and Found: Computational Quality Assurance of Crowdsourced Knowledge on Morphological Defectivity in Wiktionary
Jonathan Sakunkoo | Annabella Sakunkoo
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)

Morphological defectivity is an intriguing and understudied phenomenon in linguistics. Addressing defectivity, where expected inflectional forms are absent, is essential for improving the accuracy of NLP tools in morphologically rich languages. However, traditional linguistic resources often lack coverage of morphological gaps as such knowledge requires significant human expertise and effort to document and verify. For scarce linguistic phenomena in under-explored languages, Wikipedia and Wiktionary often serve as among the few accessible resources. Despite their extensive reach, their reliability has been a subject of controversy. This study customizes a novel neural morphological analyzer to annotate Latin and Italian corpora. Using the massive annotated data, crowd-sourced lists of defective verbs compiled from Wiktionary are validated computationally. Our results indicate that while Wiktionary provides a highly reliable account of Italian morphological gaps, 7% of Latin lemmata listed as defective show strong corpus evidence of being non-defective. This discrepancy highlights potential limitations of crowd-sourced wikis as definitive sources of linguistic knowledge, particularly for less-studied phenomena and languages, despite their value as resources for rare linguistic features. By providing scalable tools and methods for quality assurance of crowd-sourced data, this work advances computational morphology and expands linguistic knowledge of defectivity in non-English, morphologically rich languages.

pdf bib
Name of Thrones: How Do LLMs Rank Student Names in Status Hierarchies Based on Race and Gender?
Annabella Sakunkoo | Jonathan Sakunkoo
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)

Across cultures, names tell a lot about their bearers as they carry deep personal, historical, and cultural significance. Names have also been found to serve as powerful signals of gender, race, and status in the social hierarchy–a pecking order in which individual positions shape others’ expectations on their perceived competence and worth (Podolny, 2005). With the widespread adoption of Large Language Models (LLMs) in education and given that names are often an input for LLMs, it is crucial to evaluate whether LLMs may sort students into status positions based on first and last names and, if so, whether it is in an unfair, biased fashion. While prior work has primarily investigated biases in first names, little attention has been paid to last names and even less to the combined effects of first and last names. In this study, we conduct a large-scale analysis with bootstrap standard errors of 45,000 name variations across 5 ethnicities to examine how AI-generated responses exhibit systemic name biases. Our study investigates three key characteristics of inequality and finds that LLMs reflect, construct, and reinforce status hierarchies based on names that signal gender and ethnicity as they encode differential expectations of competence, leadership, and economic potential. Contrary to the common assumption that AI tends to favor Whites, we show that East and, in some contexts, South Asian names receive higher rankings. We also disaggregate Asians, a population projected to be the largest immigrant group in the U.S. by 2055. Our results challenge the monolithic Asian model minority assumption, illustrating a more complex and stratified model of bias. Additionally, spanning cultural categories by adopting Western first names improves AI-perceived status for East and Southeast Asian students, particularly for girls. Our findings underscore the importance of intersectional and more nuanced understandings of race, gender, and mixed identities in the evaluation of LLMs, rather than relying on broad, monolithic, and mutually exclusive categories. By examining LLM bias and discrimination in our multicultural contexts, our study illustrates potential harms of using LLMs in education as they do not merely reflect implicit biases but also actively construct new social hierarchies that can unfairly shape long-term life trajectories. An LLM that systematically assigns lower grades or subtly less favorable evaluations to students with certain name signals reinforces a tiered system of privilege and opportunity. Some groups may face structural disadvantages, while others encounter undue pressure from inflated expectations.