Kelly Chen
2026
Identity-Robust Language Model Generation via Content Integrity Preservation
Miao Zhang | Kelly Chen | Mehrab Tanjim | Rumi Chunara
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Miao Zhang | Kelly Chen | Mehrab Tanjim | Rumi Chunara
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Language Model (LLM) outputs often vary across user sociodemographic attributes, leading to disparities in factual accuracy, utility, and safety, even for objective questions where demographic information is irrelevant. Unlike prior work on stereotypical or representational bias, this paper studies identity-dependent degradation of core response quality. We show empirically that such degradation arises from biased generation behavior, despite factual knowledge being robustly encoded across identities. Motivated by this mismatch, we propose a lightweight, training-free framework for identity-robust generation that selectively neutralizes non-critical identity information while preserving semantically essential attributes, thus maintaining output content integrity. Experiments across four benchmarks and 18 sociodemographic identities demonstrate an average 66.3% reduction in identity-dependent bias compared to vanilla prompting and outperforms existing prompt-based defenses. Our work addresses a critical gap in mitigating the impact of user identity cues in prompts on core generation quality.
2025
Spoken Document Retrieval for an Unwritten Language: A Case Study on Gormati
Sanjay Booshanam | Kelly Chen | Ondrej Klejch | Thomas Reitmaier | Dani Kalarikalayil Raju | Electra Wallington | Nina Markl | Jennifer Pearson | Matt Jones | Simon Robinson | Peter Bell
Findings of the Association for Computational Linguistics: EMNLP 2025
Sanjay Booshanam | Kelly Chen | Ondrej Klejch | Thomas Reitmaier | Dani Kalarikalayil Raju | Electra Wallington | Nina Markl | Jennifer Pearson | Matt Jones | Simon Robinson | Peter Bell
Findings of the Association for Computational Linguistics: EMNLP 2025
Speakers of unwritten languages have the potential to benefit from speech-based automatic information retrieval systems. This paper proposes a speech embedding technique that facilitates such a system that we can be used in a zero-shot manner on the target language. After conducting development experiments on several written Indic languages, we evaluate our method on a corpus of Gormati – an unwritten language – that was previously collected in partnership with an agrarian Banjara community in Maharashtra State, India, specifically for the purposes of information retrieval. Our system achieves a Top 5 retrieval rate of 87.9% on this data, giving the hope that it may be useable by unwritten language speakers worldwide.