Erik Derner

2025

pdf bib abs
Gender Representation Bias Analysis in LLM-Generated Czech and Slovenian Texts
Erik Derner | Kristina Batistič
Proceedings of the 10th Workshop on Slavic Natural Language Processing (Slavic NLP 2025)

Large language models (LLMs) often reflect social biases present in their training data, including imbalances in how different genders are represented. While most prior work has focused on English, gender representation bias remains underexplored in morphologically rich languages where grammatical gender is pervasive. We present a method for detecting and quantifying such bias in Czech and Slovenian, using LLMs to classify gendered person references in LLM-generated narratives. Applying this method to outputs from a range of models, we find substantial variation in gender balance. While some models produce near-equal proportions of male and female references, others exhibit strong male overrepresentation. Our findings highlight the need for fine-grained bias evaluation in under-represented languages and demonstrate the potential of LLM-based annotation in this space. We make our code and data publicly available.

pdf bib abs
Leveraging Large Language Models to Measure Gender Representation Bias in Gendered Language Corpora
Erik Derner | Sara Sansalvador De La Fuente | Yoan Gutierrez | Paloma Moreda Pozo | Nuria M Oliver
Proceedings of the 6th Workshop on Gender Bias in Natural Language Processing (GeBNLP)

Large language models (LLMs) often inherit and amplify social biases embedded in their training data. A prominent social bias is gender bias. In this regard, prior work has mainly focused on gender stereotyping bias – the association of specific roles or traits with a particular gender – in English and on evaluating gender bias in model embeddings or generated outputs. In contrast, gender representation bias – the unequal frequency of references to individuals of different genders – in the training corpora has received less attention. Yet such imbalances in the training data constitute an upstream source of bias that can propagate and intensify throughout the entire model lifecycle. To fill this gap, we propose a novel LLM-based method to detect and quantify gender representation bias in LLM training data in gendered languages, where grammatical gender challenges the applicability of methods developed for English. By leveraging the LLMs’ contextual understanding, our approach automatically identifies and classifies person-referencing words in gendered language corpora. Applied to four Spanish-English benchmarks and five Valencian corpora, our method reveals substantial male-dominant imbalances. We show that such biases in training data affect model outputs, but can surprisingly be mitigated leveraging small-scale training on datasets that are biased towards the opposite gender. Our findings highlight the need for corpus-level gender bias analysis in multilingual NLP. We make our code and data publicly available.

pdf bib abs
Beyond Words: Multilingual and Multimodal Red Teaming of MLLMs
Erik Derner | Kristina Batistič
Proceedings of the The First Workshop on LLM Security (LLMSEC)

Multimodal large language models (MLLMs) are increasingly deployed in real-world applications, yet their safety remains underexplored, particularly in multilingual and visual contexts. In this work, we present a systematic red teaming framework to evaluate MLLM safeguards using adversarial prompts translated into seven languages and delivered via four input modalities: plain text, jailbreak prompt + text, text rendered as an image, and jailbreak prompt + text rendered as an image. We find that rendering prompts as images increases attack success rates and reduces refusal rates, with the effect most pronounced in lower-resource languages such as Slovenian, Czech, and Valencian. Our results suggest that vision-based multilingual attacks expose a persistent gap in current alignment strategies, highlighting the need for robust multilingual and multimodal MLLM safety evaluation and mitigation of these risks. We make our code and data available.

pdf bib abs
Guardians of Trust: Risks and Opportunities for LLMs in Mental Health
Miguel Baidal | Erik Derner | Nuria Oliver
Proceedings of the Fourth Workshop on NLP for Positive Impact (NLP4PI)

The integration of large language models (LLMs) into mental health applications offers promising opportunities for positive social impact. However, it also presents critical risks. While previous studies have often addressed these challenges and risks individually, a broader and multi-dimensional approach is still lacking. In this paper, we introduce a taxonomy of the main challenges related to the use of LLMs for mental health and propose a structured, comprehensive research agenda to mitigate them. We emphasize the need for explainable, emotionally aware, culturally sensitive, and clinically aligned systems, supported by continuous monitoring and human oversight. By placing our work within the broader context of natural language processing (NLP) for positive impact, this research contributes to ongoing efforts to ensure that technological advances in NLP responsibly serve vulnerable populations, fostering a future where mental health solutions improve rather than endanger well-being.

Co-authors

Nuria Oliver 1

Paloma Moreda Pozo 1

Venues

Fix data

Erik Derner

Fixing paper assignments

2025

Co-authors

Venues