Prior Beliefs Prejudice LLM-as-Judge: Evidence from Persuasion Evaluation
Pardis Sadat Zahraei, Xiaoning Wang, Nimet Beyza Bozdag, Gokhan Tur, Dilek Hakkani-T\"ur
Abstract
Large Language Models (LLMs) are increasingly used as judges to evaluate text quality, moderate content, and assess arguments. We investigate whether alignment-instilled prior beliefs bias LLM judgments, using persuasion evaluation as a representative task. We find a systematic failure: models conflate their trained beliefs with rhetorical quality, rating identical claims differently based on belief alignment rather than argumentative merit. A bare assertion aligned with training receives higher scores than a well-crafted counter-argument, even when explicitly instructed to judge rhetoric alone. We introduce ConvinceQA, a dataset of 27,756 persuasive arguments with controlled stance variation across subjective, harmful, and misinformation domains, and demonstrate this prior prejudice across models. We exploit this failure through persuasion-based probing: evaluating minimal pairs that differ only in the subject token bypasses learned refusals and reveals hidden biases. Analysis identifies three failure modes, with belief-conditioned rating inflation accounting for 88% of cases. Cross-task validation on essay quality assessment and debate judging confirms this is a pervasive limitation.- Anthology ID:
- 2026.findings-acl.2087
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 42049–42082
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.2087/
- DOI:
- Cite (ACL):
- Pardis Sadat Zahraei, Xiaoning Wang, Nimet Beyza Bozdag, Gokhan Tur, and Dilek Hakkani-T\"ur. 2026. Prior Beliefs Prejudice LLM-as-Judge: Evidence from Persuasion Evaluation. In Findings of the Association for Computational Linguistics: ACL 2026, pages 42049–42082, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- Prior Beliefs Prejudice LLM-as-Judge: Evidence from Persuasion Evaluation (Zahraei et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.2087.pdf