Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts

Hanhua Hong, Chenghao Xiao, Yang Wang, Yiqi Liu, Wenge Rong, Chenghua Lin


Abstract
Evaluating natural language generation systems is challenging due to the diversity of valid outputs. While human evaluation is the gold standard, it suffers from inconsistencies, lack of standardization, and demographic biases, limiting reproducibility. LLM-based evaluators offer a scalable alternative but are highly sensitive to prompt design, where small variations can lead to significant discrepancies. In this work, we propose an inversion learning method that learns effective reverse mappings from model outputs back to their input instructions, enabling the automatic generation of highly effective, model-specific evaluation prompts. Our method requires only a single evaluation sample and eliminates the need for time-consuming manual prompt engineering, thereby improving both efficiency and robustness. Our work contributes toward a new direction for more robust and efficient LLM-based evaluation.
Anthology ID:
2026.tacl-1.31
Volume:
Transactions of the Association for Computational Linguistics, Volume 14
Month:
Year:
2026
Address:
Cambridge, MA
Venue:
TACL
SIG:
Publisher:
MIT Press
Note:
Pages:
689–710
Language:
URL:
https://preview.aclanthology.org/ingest-latest-mitpress-cl-tacl/2026.tacl-1.31/
DOI:
10.1162/tacl.a.617
Bibkey:
Cite (ACL):
Hanhua Hong, Chenghao Xiao, Yang Wang, Yiqi Liu, Wenge Rong, and Chenghua Lin. 2026. Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts. Transactions of the Association for Computational Linguistics, 14:689–710.
Cite (Informal):
Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts (Hong et al., TACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-latest-mitpress-cl-tacl/2026.tacl-1.31.pdf