Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement

Wenda Xu; Guanglei Zhu; Xuandong Zhao; Liangming Pan; Lei Li; William Wang

doi:10.18653/v1/2024.acl-long.826

Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement

Wenda Xu, Guanglei Zhu, Xuandong Zhao, Liangming Pan, Lei Li, William Wang

Abstract

Recent studies show that large language models (LLMs) improve their performance through self-feedback on certain tasks while degrade on others. We discovered that such a contrary is due to LLM’s bias in evaluating their own output. In this paper, we formally define LLM’s self-bias – the tendency to favor its own generation – using two statistics. We analyze six LLMs (GPT-4, GPT-3.5, Gemini, LLaMA2, Mixtral and DeepSeek) on translation, constrained text generation, and mathematical reasoning tasks. We find that self-bias is prevalent in all examined LLMs across multiple languages and tasks. Our analysis reveals that while the self-refine pipeline improves the fluency and understandability of model outputs, it further amplifies self-bias. To mitigate such biases, we discover that larger model size and external feedback with accurate assessment can significantly reduce bias in the self-refine pipeline, leading to actual performance improvement in downstream tasks. The code and data are released at https://github.com/xu1998hz/llm_self_bias.

Anthology ID:: 2024.acl-long.826
Volume:: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: August
Year:: 2024
Address:: Bangkok, Thailand
Editors:: Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 15474–15492
Language:
URL:: https://preview.aclanthology.org/ingest_wac_2008/2024.acl-long.826/
DOI:: 10.18653/v1/2024.acl-long.826
Bibkey:
Cite (ACL):: Wenda Xu, Guanglei Zhu, Xuandong Zhao, Liangming Pan, Lei Li, and William Wang. 2024. Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15474–15492, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):: Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement (Xu et al., ACL 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest_wac_2008/2024.acl-long.826.pdf

PDF Cite Search Fix data