Curse of Knowledge: Your Guidance and Provided Knowledge are biasing LLM Judges in Complex Evaluation

Weiyuan Li; Xintao Wang; Siyu Yuan; Rui Xu; Jiangjie Chen; Qingqing Dong; Yanghua Xiao; Deqing Yang

doi:10.18653/v1/2025.findings-emnlp.805

Curse of Knowledge: Your Guidance and Provided Knowledge are biasing LLM Judges in Complex Evaluation

Weiyuan Li, Xintao Wang, Siyu Yuan, Rui Xu, Jiangjie Chen, Qingqing Dong, Yanghua Xiao, Deqing Yang

Abstract

As large language models (LLMs) grow more capable, they face increasingly diverse and complex tasks, making reliable evaluation challenging. The paradigm of LLMs as judges has emerged as a scalable solution, yet prior work primarily focuses on simple settings. Their reliability in complex tasks—where multi-faceted rubrics, unstructured reference answers, and nuanced criteria are critical—remains understudied. In this paper, we constructed ComplexEval Bench, a challenge benchmark designed to systematically expose and quantify Auxiliary Information Induced Biases. We systematically investigated and validated 6 previously unexplored biases across 12 basic and 3 advanced scenarios. Key findings reveal: (1) all evaluated models exhibit significant susceptibility to these biases, with bias magnitude scaling with task complexity; (2) notably, Large Reasoning Models (LRMs) show paradoxical vulnerability. Our in-depth analysis offers crucial insights for improving the accuracy and verifiability of evaluation signals, paving the way for more general and robust evaluation models.

Anthology ID:: 2025.findings-emnlp.805
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 14900–14924
Language:
URL:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.805/
DOI:: 10.18653/v1/2025.findings-emnlp.805
Bibkey:
Cite (ACL):: Weiyuan Li, Xintao Wang, Siyu Yuan, Rui Xu, Jiangjie Chen, Qingqing Dong, Yanghua Xiao, and Deqing Yang. 2025. Curse of Knowledge: Your Guidance and Provided Knowledge are biasing LLM Judges in Complex Evaluation. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 14900–14924, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Curse of Knowledge: Your Guidance and Provided Knowledge are biasing LLM Judges in Complex Evaluation (Li et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.805.pdf
Checklist:: 2025.findings-emnlp.805.checklist.pdf

PDF Cite Search Checklist Fix data