Are Generative Models Underconfident? Better Quality Estimation with Boosted Model Probability

Tu Anh Dinh; Jan Niehues

Are Generative Models Underconfident? Better Quality Estimation with Boosted Model Probability

Abstract

Quality Estimation (QE) is estimating quality of the model output during inference when the ground truth is not available. Deriving output quality from the models’ output probability is the most trivial and low-effort way. However, we show that the output probability of text-generation models can appear underconfident. At each output step, there can be multiple correct options, making the probability distribution spread out more. Thus, lower probability does not necessarily mean lower output quality. Due to this observation, we propose a QE approach called BoostedProb, which boosts the model’s confidence in cases where there are multiple viable output options. With no increase in complexity, BoostedProb is notably better than raw model probability in different settings, achieving on average +0.194 improvement in Pearson correlation to ground-truth quality. It also comes close to or outperforms more costly approaches like supervised or ensemble-based QE in certain settings.

Anthology ID:: 2025.emnlp-main.166
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3364–3382
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.166/
DOI:
Bibkey:
Cite (ACL):: Tu Anh Dinh and Jan Niehues. 2025. Are Generative Models Underconfident? Better Quality Estimation with Boosted Model Probability. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 3364–3382, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Are Generative Models Underconfident? Better Quality Estimation with Boosted Model Probability (Dinh & Niehues, EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.166.pdf
Checklist:: 2025.emnlp-main.166.checklist.pdf

PDF Cite Search Checklist Fix data