Prior Beliefs Prejudice LLM-as-Judge: Evidence from Persuasion Evaluation

Pardis Sadat Zahraei, Xiaoning Wang, Nimet Beyza Bozdag, Gokhan Tur, Dilek Hakkani-T\"ur


Abstract
Large Language Models (LLMs) are increasingly used as judges to evaluate text quality, moderate content, and assess arguments. We investigate whether alignment-instilled prior beliefs bias LLM judgments, using persuasion evaluation as a representative task. We find a systematic failure: models conflate their trained beliefs with rhetorical quality, rating identical claims differently based on belief alignment rather than argumentative merit. A bare assertion aligned with training receives higher scores than a well-crafted counter-argument, even when explicitly instructed to judge rhetoric alone. We introduce ConvinceQA, a dataset of 27,756 persuasive arguments with controlled stance variation across subjective, harmful, and misinformation domains, and demonstrate this prior prejudice across models. We exploit this failure through persuasion-based probing: evaluating minimal pairs that differ only in the subject token bypasses learned refusals and reveals hidden biases. Analysis identifies three failure modes, with belief-conditioned rating inflation accounting for 88% of cases. Cross-task validation on essay quality assessment and debate judging confirms this is a pervasive limitation.
Anthology ID:
2026.findings-acl.2087
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
42049–42082
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.2087/
DOI:
Bibkey:
Cite (ACL):
Pardis Sadat Zahraei, Xiaoning Wang, Nimet Beyza Bozdag, Gokhan Tur, and Dilek Hakkani-T\"ur. 2026. Prior Beliefs Prejudice LLM-as-Judge: Evidence from Persuasion Evaluation. In Findings of the Association for Computational Linguistics: ACL 2026, pages 42049–42082, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Prior Beliefs Prejudice LLM-as-Judge: Evidence from Persuasion Evaluation (Zahraei et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.2087.pdf
Checklist:
 2026.findings-acl.2087.checklist.pdf