Wolf Tilo Balke
Also published as: Wolf-Tilo Balke
2024
Analyzing Effects of Learning Downstream Tasks on Moral Bias in Large Language Models
Niklas Kiehne
|
Alexander Ljapunov
|
Marc Bätje
|
Wolf-Tilo Balke
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Pre-training and fine-tuning large language models (LMs) is currently the state-of-the-art methodology for enabling data-scarce downstream tasks. However, the derived models still tend to replicate and perpetuate social biases. To understand this process in more detail, this paper investigates the actual effects of learning downstream tasks on moral bias in LMs. We develop methods to assess the agreement of LMs to explicitly codified norms in both pre-training and fine-tuning stages. Even if a pre-trained foundation model exhibits consistent norms, we find that introducing downstream tasks may indeed lead to unexpected inconsistencies in norm representation. Specifically, we observe two phenomena during fine-tuning across both masked and causal LMs: (1) pre-existing moral bias may be mitigated or amplified even when presented with opposing views and (2) prompt sensitivity may be negatively impacted. We provide empirical evidence of models deteriorating into conflicting states, where contradictory answers can easily be triggered by slight modifications in the input sequence. Our findings thus raise concerns about the general ability of LMs to mitigate moral biases effectively.
2022
Quantifying Bias from Decoding Techniques in Natural Language Generation
Mayukh Das
|
Wolf Tilo Balke
Proceedings of the 29th International Conference on Computational Linguistics
Natural language generation (NLG) models can propagate social bias towards particular demography. Though several studies investigated bias from data and model, NLG task distinctively uses stochastic decoder that can positively or negatively impact the bias-sensitive tokens initially predicted by the model. To address this gap in research, we present an extensive analysis of bias from decoding techniques for open-domain language generation considering the entire decoding space. We analyze to what extent bias metrics like toxicity and sentiment are impacted by the individual components of decoder algorithms. To this extent, we also analyze the trade-off between bias scores and human-annotated generation quality throughout the decoder space. Together, these methods reveal the imperative of testing inference time bias and provide evidence on the usefulness of inspecting the entire decoding spectrum.
Contextualizing Language Models for Norms Diverging from Social Majority
Niklas Kiehne
|
Hermann Kroll
|
Wolf-Tilo Balke
Findings of the Association for Computational Linguistics: EMNLP 2022
To comprehensibly contextualize decisions, artificial systems in social situations need a high degree of awareness of the rules of conduct of human behavior. Especially transformer-based language models have recently been shown to exhibit some such awareness. But what if norms in some social setting do not adhere to or even blatantly deviate from the mainstream? In this paper, we introduce a novel mechanism based on deontic logic to allow for a flexible adaptation of individual norms by de-biasing training data sets and a task-reduction to textual entailment. Building on the popular ‘Moral Stories’ dataset we on the one hand highlight the intrinsic bias of current language models, on the other hand characterize the adaptability of pre-trained models to deviating norms in fine-tuning settings.
Search