Do Before You Judge: Self-Reference as a Pathway to Better LLM Evaluation

Wei-Hsiang Lin, Sheng-Lun Wei, Hen-Hsen Huang, Hsin-Hsi Chen


Abstract
LLM-as-Judge frameworks are increasingly popular for AI evaluation, yet research findings on the relationship between models’ generation and judgment abilities remain inconsistent. We investigate this relationship through systematic dataset- and instance-level analyses across 11 models and 21 diverse tasks. Despite both capabilities relying on the same underlying knowledge, our analyses reveal they are only weakly correlated, primarily due to LLMs’ sensitivity to the responses being judged. To address this, we propose a self-reference-guided evaluation strategy that leverages a model’s own answers as references. This approach significantly strengthens the correlation between generation and judgment abilities, offering a practical path to align these skills and providing a reliable proxy for model selection in evaluation tasks.
Anthology ID:
2025.findings-emnlp.1342
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
24651–24672
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1342/
DOI:
10.18653/v1/2025.findings-emnlp.1342
Bibkey:
Cite (ACL):
Wei-Hsiang Lin, Sheng-Lun Wei, Hen-Hsen Huang, and Hsin-Hsi Chen. 2025. Do Before You Judge: Self-Reference as a Pathway to Better LLM Evaluation. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 24651–24672, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Do Before You Judge: Self-Reference as a Pathway to Better LLM Evaluation (Lin et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1342.pdf
Checklist:
 2025.findings-emnlp.1342.checklist.pdf