Evaluating Large Language Model Biases in Persona-Steered Generation

Andy Liu; Mona Diab; Daniel Fried

doi:10.18653/v1/2024.findings-acl.586

Evaluating Large Language Model Biases in Persona-Steered Generation

Abstract

The task of persona-steered text generation requires large language models (LLMs) to generate text that reflects the distribution of views that an individual fitting a persona could have. People have multifaceted personas, but prior work on bias in LLM-generated opinions has only explored multiple-choice settings or one-dimensional personas. We define an incongruous persona as a persona with multiple traits where one trait makes its other traits less likely in human survey data, e.g. political liberals who support increased military spending. We find that LLMs are 9.7% less steerable towards incongruous personas than congruous ones, sometimes generating the stereotypical stance associated with its demographic rather than the target stance. Models that we evaluate that are fine-tuned with Reinforcement Learning from Human Feedback (RLHF) are more steerable, especially towards stances associated with political liberals and women, but present significantly less diverse views of personas. We also find variance in LLM steerability that cannot be predicted from multiple-choice opinion evaluation. Our results show the importance of evaluating models in open-ended text generation, as it can surface new LLM opinion biases. Moreover, such a setup can shed light on our ability to steer models toward a richer and more diverse range of viewpoints.

Anthology ID:: 2024.findings-acl.586
Volume:: Findings of the Association for Computational Linguistics ACL 2024
Month:: August
Year:: 2024
Address:: Bangkok, Thailand and virtual meeting
Editors:: Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 9832–9850
Language:
URL:: https://aclanthology.org/2024.findings-acl.586
DOI:: 10.18653/v1/2024.findings-acl.586
Bibkey:
Cite (ACL):: Andy Liu, Mona Diab, and Daniel Fried. 2024. Evaluating Large Language Model Biases in Persona-Steered Generation. In Findings of the Association for Computational Linguistics ACL 2024, pages 9832–9850, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
Cite (Informal):: Evaluating Large Language Model Biases in Persona-Steered Generation (Liu et al., Findings 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/nschneid-patch-5/2024.findings-acl.586.pdf

PDF Search