Calibrating Model-Based Evaluation Metrics for Summarization

Hongye Liu, Dhanajit Brahma, Ricardo Henao


Abstract
Recent advances in summary evaluation are based on model-based metrics to assess quality dimensions, such as completeness, conciseness, and faithfulness. However, these methods often require large language models, and predicted scores are frequently miscalibrated, limiting their reliability. Moreover, evaluating the average quality across different summaries for a single document typically requires access to multiple reference summaries. Here, we propose a general framework that generates individual and average proxy scores without relying on reference summaries, human annotations, or expensive model-based metrics. We also propose group isotonic regression binning (GIRB), a calibration method that adjusts the raw predictions to better align with ground-truth evaluation metrics. While we focus on continuous-value scenarios, such as summarization, the method is applicable to discrete-value tasks, such as question answering. Experiments on seven datasets demonstrate that our approach consistently outperforms existing baselines.
Anthology ID:
2026.findings-acl.1760
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
35285–35315
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.findings-acl.1760/
DOI:
Bibkey:
Cite (ACL):
Hongye Liu, Dhanajit Brahma, and Ricardo Henao. 2026. Calibrating Model-Based Evaluation Metrics for Summarization. In Findings of the Association for Computational Linguistics: ACL 2026, pages 35285–35315, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Calibrating Model-Based Evaluation Metrics for Summarization (Liu et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.findings-acl.1760.pdf
Checklist:
 2026.findings-acl.1760.checklist.pdf