Learning to Judge: LLMs Designing and Applying Evaluation Rubrics

Clemencia Siro, Pourya Aliannejadi, Mohammad Aliannejadi


Abstract
Large language models (LLMs) are increasingly used as evaluators for natural language generation, applying human-defined rubrics to assess system outputs. However, human rubrics are often static and misaligned with how models internally represent language quality. We introduce GER-Eval (Generating Evaluation Rubrics for Evaluation) to investigate whether LLMs can design and use their own evaluation rubrics. We evaluate the semantic coherence and scoring reliability of LLM-defined criteria and their alignment with human criteria. LLMs reliably generate interpretable and task-aware evaluation dimensions and apply them within models, but their scoring reliability degrades in factual and knowledge-intensive settings. Closed-source models such as GPT-4o achieve higher agreement and cross-model generalization than open-weight models such as Llama. Our findings position evaluation as a learned linguistic capability of LLMs—consistent within models but fragmented across them—and call for new methods that jointly model human and LLM evaluative language to improve reliability and interpretability.
Anthology ID:
2026.findings-eacl.335
Volume:
Findings of the Association for Computational Linguistics: EACL 2026
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6371–6389
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.335/
DOI:
Bibkey:
Cite (ACL):
Clemencia Siro, Pourya Aliannejadi, and Mohammad Aliannejadi. 2026. Learning to Judge: LLMs Designing and Applying Evaluation Rubrics. In Findings of the Association for Computational Linguistics: EACL 2026, pages 6371–6389, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Learning to Judge: LLMs Designing and Applying Evaluation Rubrics (Siro et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.335.pdf
Checklist:
 2026.findings-eacl.335.checklist.pdf