Improving Large Language Model Confidence Estimates using Extractive Rationales for Classification

Jane Arleth Dela Cruz; Iris Hendrickx; Martha Larson

Improving Large Language Model Confidence Estimates using Extractive Rationales for Classification

Jane Arleth Dela Cruz, Iris Hendrickx, Martha Larson

Abstract

The adoption of large language models (LLMs) in high-stake scenarios continues to be a challenge due to lack of effective confidence calibration. Although LLMs are capable of providing convincing self-explanations and verbalizing confidence in NLP tasks, they tend to exhibit overconfidence when using generative or free-text rationales (e.g. Chain-of-Thought), where reasoning steps tend to lack verifiable grounding.In this paper, we investigate whether adding explanations in the form of extractive rationales –snippets of the input text that directly support the predictions, can improve the confidence calibration of LLMs in classification tasks.We examine two approaches for integrating these rationales: (1) a one-stage rationale-generation with prediction and (2) a two-stage rationale-guided confidence calibration.We evaluate these approaches on a disaster tweet classification task using four different off-the-shelf LLMs. Our results show that extracting rationales both before and after prediction can improve the confidence estimates of the LLMs. Furthermore, we find that replacing valid extractive rationales with irrelevant ones significantly lowers model confidence, highlighting the importance of rationale quality.This simple yet effective method improves LLM verbalized confidence and reduces overconfidence in possible hallucination.

Anthology ID:: 2025.gem-1.49
Volume:: Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²)
Month:: July
Year:: 2025
Address:: Vienna, Austria and virtual meeting
Editors:: Kaustubh Dhole, Miruna Clinciu
Venues:: GEM | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 549–560
Language:
URL:: https://preview.aclanthology.org/corrections-2025-08/2025.gem-1.49/
DOI:
Bibkey:
Cite (ACL):: Jane Arleth Dela Cruz, Iris Hendrickx, and Martha Larson. 2025. Improving Large Language Model Confidence Estimates using Extractive Rationales for Classification. In Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²), pages 549–560, Vienna, Austria and virtual meeting. Association for Computational Linguistics.
Cite (Informal):: Improving Large Language Model Confidence Estimates using Extractive Rationales for Classification (Cruz et al., GEM 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/corrections-2025-08/2025.gem-1.49.pdf

PDF Cite Search Fix data