Vera Neplenbroek

2026

One Persona, Many Cues, Different Results: How Sociodemographic Cues Impact LLM Personalization
Franziska Weeber | Vera Neplenbroek | Jan Batzner | Sebastian Pad\'o
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Personalization of LLMs by sociodemographic subgroup often improves user experience, but can also introduce or amplify biases and unfairoutcomes across groups. Prior work has employed so-called personas, sociodemographic user attributes conveyed to a model, to studybias in LLMs by relying on a single cue to prompt a persona, such as user names or explicit attribute mentions. This disregards LLM sensitivity to prompt variation and the rarity of some cues in real interactions (external validity). We compare six commonly used personacues across seven open and proprietary LLMs on four writing and advice tasks. While cues are overall highly correlated, they produce sub-stantial variance in responses across personas that can change findings on persona-induced differences and bias. We therefore cautionagainst claims based on single persona cues, especially when they are overly explicit and have low external validity.

2025

pdf bib abs

Reading Between the Prompts: How Stereotypes Shape LLM’s Implicit Personalization
Vera Neplenbroek | Arianna Bisazza | Raquel Fernández
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Generative Large Language Models (LLMs) infer user’s demographic information from subtle cues in the conversation — a phenomenon called implicit personalization. Prior work has shown that such inferences can lead to lower quality responses for users assumed to be from minority groups, even when no demographic information is explicitly provided. In this work, we systematically explore how LLMs respond to stereotypical cues using controlled synthetic conversations, by analyzing the models’ latent user representations through both model internals and generated answers to targeted user questions. Our findings reveal that LLMs do infer demographic attributes based on these stereotypical signals, which for a number of groups even persists when the user explicitly identifies with a different demographic group. Finally, we show that this form of stereotype-driven implicit personalization can be effectively mitigated by intervening on the model’s internal representations using a trained linear probe to steer them toward the explicitly stated identity. Our results highlight the need for greater transparency and control in how LLMs represent user identity.

pdf bib abs

Cross-Lingual Transfer of Debiasing and Detoxification in Multilingual LLMs: An Extensive Investigation
Vera Neplenbroek | Arianna Bisazza | Raquel Fernández
Findings of the Association for Computational Linguistics: ACL 2025

Recent generative large language models (LLMs) show remarkable performance in non-English languages, but when prompted in those languages they tend to express higher harmful social biases and toxicity levels. Prior work has shown that finetuning on specialized datasets can mitigate this behavior, and doing so in English can transfer to other languages. In this work, we investigate the impact of different finetuning methods on the model’s bias and toxicity, but also on its ability to produce fluent and diverse text. We reduce biases by finetuning on curated non-harmful text, but find only direct preference optimization to be effective for mitigating toxicity. The mitigation caused by applying these methods in English also transfers to non-English languages. We find evidence that the extent to which transfer takes place can be predicted by the amount of data in a given language present in the model’s pretraining data. However, this transfer of bias and toxicity mitigation often comes at the expense of decreased language generation ability in non-English languages, highlighting the importance of developing language-specific bias and toxicity mitigation methods.

There is an increasing trend towards evaluating NLP models with LLMs instead of human judgments, raising questions about the validity of these evaluations, as well as their reproducibility in the case of proprietary models. We provide JUDGE-BENCH, an extensible collection of 20 NLP datasets with human annotations covering a broad range of evaluated properties and types of data, and comprehensively evaluate 11 current LLMs, covering both open-weight and proprietary models, for their ability to replicate the annotations. Our evaluations show substantial variance across models and datasets. Models are reliable evaluators on some tasks, but overall display substantial variability depending on the property being evaluated, the expertise level of the human judges, and whether the language is human or model-generated. We conclude that LLMs should be carefully validated against human judgments before being used as evaluators.