LLM as a Meta-Judge: Synthetic Data for NLP Evaluation Metric Validation

Lukáš Eigler; Jindřich Libovický; David Hurych

LLM as a Meta-Judge: Synthetic Data for NLP Evaluation Metric Validation

Lukáš Eigler, Jindřich Libovický, David Hurych

Abstract

Validating evaluation metrics for NLG typically relies on expensive and time-consuming human annotations, which predominantly exists only for English datasets. We propose LLM as a Meta-Judge, a scalable framework that utilizes LLMs to generate synthetic evaluation datasets via controlled semantic degradation of real data, replacing human judgment. We validate our approach using meta-correlation, measuring the alignment between metric rankings derived from synthetic data and those from standard human benchmarks. Experiments across Machine Translation, Question Answering, and Summarization demonstrate that synthetic validation serves as a reliable proxy for human judgment, achieving meta-correlations exceeding 0.9 in multilingual QA and proves to be a viable alternative where human judgments are unavailable or too expensive to obtain. Our code and data are publicly available at https://github.com/eiglerl/meta-judge.

Anthology ID:: 2026.acl-srw.125
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Santosh T.Y.S.S., Juan Diego Rodriguez, Ona de Gibert
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1417–1435
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-srw.125/
DOI:
Bibkey:
Cite (ACL):: Lukáš Eigler, Jindřich Libovický, and David Hurych. 2026. LLM as a Meta-Judge: Synthetic Data for NLP Evaluation Metric Validation. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026), pages 1417–1435, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: LLM as a Meta-Judge: Synthetic Data for NLP Evaluation Metric Validation (Eigler et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-srw.125.pdf

PDF Cite Search Fix data