OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs

Ivan Kartac, Mateusz Lango, Ondrej Dusek


Abstract
Large Language Models (LLMs) have demonstrated great potential as evaluators of NLG systems, allowing for high-quality, reference-free, and multi-aspect assessments. However, existing LLM-based metrics suffer from two major drawbacks: reliance on proprietary models to generate training data or perform evaluations, and a lack of fine-grained, explanatory feedback. We introduce OpeNLGauge, a fully open-source, reference-free NLG evaluation metric that provides accurate explanations based on individual error spans. OpeNLGauge is available as a two-stage ensemble of larger open-weight LLMs, or as a small fine-tuned evaluation model, with confirmed generalizability to unseen tasks, domains and aspects. Our extensive meta-evaluation shows that OpeNLGauge achieves competitive correlation with human judgments, outperforming state-of-the-art models on certain tasks while maintaining full reproducibility and providing explanations more than twice as accurate.
Anthology ID:
2025.inlg-main.19
Volume:
Proceedings of the 18th International Natural Language Generation Conference
Month:
October
Year:
2025
Address:
Hanoi, Vietnam
Editors:
Lucie Flek, Shashi Narayan, Lê Hồng Phương, Jiahuan Pei
Venue:
INLG
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
292–337
Language:
URL:
https://preview.aclanthology.org/author-page-lei-gao-usc/2025.inlg-main.19/
DOI:
Bibkey:
Cite (ACL):
Ivan Kartac, Mateusz Lango, and Ondrej Dusek. 2025. OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs. In Proceedings of the 18th International Natural Language Generation Conference, pages 292–337, Hanoi, Vietnam. Association for Computational Linguistics.
Cite (Informal):
OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs (Kartac et al., INLG 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-lei-gao-usc/2025.inlg-main.19.pdf