Chirag Nagpal
2025
Bias in Language Models: Beyond Trick Tests and Towards RUTEd Evaluation
Kristian Lum
|
Jacy Reese Anthis
|
Kevin Robinson
|
Chirag Nagpal
|
Alexander Nicholas D’Amour
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Standard bias benchmarks used for large language models (LLMs) measure the association between social attributes in model inputs and single-word model outputs. We test whether these benchmarks are robust to lengthening the model outputs via a more realistic user prompt, in the commonly studied domain of gender-occupation bias, as a step towards measuring Realistic Use and Tangible Effects (i.e., RUTEd evaluations). From the current literature, we adapt three standard metrics of next-word prediction (neutrality, skew, and stereotype), and we develop analogous RUTEd evaluations in three contexts of real-world LLM use: children’s bedtime stories, user personas, and English language learning exercises. We find that standard bias metrics have no significant correlation with long-form output metrics. For example, selecting the least biased model based on the standard “trick tests” coincides with selecting the least biased model based on longer output no more than random chance. There may not yet be evidence to justify standard benchmarks as reliable proxies of real-world biases, and we encourage further development of context-specific RUTEd evaluations.
2017
An Entity Resolution Approach to Isolate Instances of Human Trafficking Online
Chirag Nagpal
|
Kyle Miller
|
Benedikt Boecking
|
Artur Dubrawski
Proceedings of the 3rd Workshop on Noisy User-generated Text
Human trafficking is a challenging law enforcement problem, and traces of victims of such activity manifest as ‘escort advertisements’ on various online forums. Given the large, heterogeneous and noisy structure of this data, building models to predict instances of trafficking is a convoluted task. In this paper we propose an entity resolution pipeline using a notion of proxy labels, in order to extract clusters from this data with prior history of human trafficking activity. We apply this pipeline to 5M records from backpage.com and report on the performance of this approach, challenges in terms of scalability, and some significant domain specific characteristics of our resolved entities.