E-star 12B: Reliable Rubric-Following and Domain-Adaptive SLM Evaluator for Korean Industrial Settings

Yonghoon Kwon, Heondeuk Lee, Barom Kang


Abstract
Automatic evaluation in industrial settings requires models to interpret and apply natural language rubrics reliably under language and domain shift. This challenge is compounded when reference answers are unavailable and proprietary models cannot be deployed due to data-governance constraints. We present E-Star-12B, a 12B-parameter evaluator for Korean industrial environments that jointly addresses rubric following and domain adaptation. Our approach combines a structured evaluation format—feedback, highlight, and decision—with a 6K high-confidence training set via multi-stage consensus-based filtering. We introduce two benchmarks: Ko Feedback Bench for rubric-following evaluation under Korean language transfer, and RAG Quality Bench for domain-specific evaluation in financial and legal settings. E-Star-12B achieves the strongest rubric alignment among small language models on Ko Feedback Bench, improving Pearson correlation by +0.173 over its base model. On RAG Quality Bench, the domain-adapted variant approaches frontier-model performance with more stable adaptation than general instruct models. Strong rubric-following capability serves as a reliable scaffold for subsequent domain adaptation.
Anthology ID:
2026.gem-main.42
Volume:
Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Simon Mille, Sebastian Gehrmann, Patrícia Schmidtová, Ondřej Dušek, Marzieh Fadaee, Kyle Lo, Enrico Santus, Gabriel Stanovsky
Venues:
GEM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
456–471
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.gem-main.42/
DOI:
Bibkey:
Cite (ACL):
Yonghoon Kwon, Heondeuk Lee, and Barom Kang. 2026. E-star 12B: Reliable Rubric-Following and Domain-Adaptive SLM Evaluator for Korean Industrial Settings. In Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM), pages 456–471, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
E-star 12B: Reliable Rubric-Following and Domain-Adaptive SLM Evaluator for Korean Industrial Settings (Kwon et al., GEM 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.gem-main.42.pdf