Talha Chafekar
2023
Understanding behaviour of large language models for short-term and long-term fairness scenarios
Talha Chafekar
|
Aafiya Hussain
|
Chon In Cheong
Proceedings of the 20th International Conference on Natural Language Processing (ICON)
Large language models (LLMs) have become increasingly accessible online, thus they can be easily used to generate synthetic data for technology. With the rising capabilities of LLMs, their applications span across many domains. With its increasing use for automating tasks, it is crucial to understand the fairness notions harboured by these models. Our work aims to explore the consistency and behaviour of GPT3.5, GPT-4 in both short-term and long-term scenarios through the lens of fairness. Additionally, the search for an optimal prompt template design for equalized opportunities has been investigated in this study. In the short-term scenario for the German Credit dataset, an intervention to a key feature recorded an increase in loan rejection rate by 37.15% for GPT-3.5 and 49.52% for GPT-4. In the long-term scenario for ML fairness gym, adding extra information about the environment to the prompts has shown no improvement to the prompt with minimal information in terms of final credit distributions. However, adding extra features to the prompt has increased the profit rate by 6.41% (from 17.2% to 23.6%) compared to a baseline maximum-reward classifier with compromising group-level recall rates.