The Silent Vote: Improving Zero-Shot LLM Reliability by Aggregating Semantic Neighborhoods

Sanket Badhe; Priyanka Tiwari; Deep Shah

The Silent Vote: Improving Zero-Shot LLM Reliability by Aggregating Semantic Neighborhoods

Sanket Badhe, Priyanka Tiwari, Deep Shah

Abstract

Large Language Models are increasingly used as zero-shot classifiers in complex reasoning tasks. However, standard constrained decoding suffers from a phenomenon we define as Renormalization Bias. When a model is restricted to a small set of target labels, the standard softmax operation discards the probability mass assigned to semantic synonyms in the original distribution. This loss of information, which we call the Silent Vote, results in artificial overconfidence and poor calibration. We propose Semantic Softmax, an inference-time layer that recovers this lost information by aggregating the scores of the semantic neighborhood surrounding each target label. We evaluate this approach on Qwen-3 and Phi-4-mini models using GoEmotions and Civil Comments datasets. Our results demonstrate consistent improvements across all evaluation metrics: Semantic Softmax substantially reduces Expected Calibration Error (ECE) and Brier Score, while simultaneously enhancing discriminative performance in terms of AUROC and Macro-F1. By accounting for linguistic nuances, our method provides a more calibrated and accurate alternative for zero-shot classification.

Anthology ID:: 2026.gem-main.48
Volume:: Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM)
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Simon Mille, Sebastian Gehrmann, Patrícia Schmidtová, Ondřej Dušek, Marzieh Fadaee, Kyle Lo, Enrico Santus, Gabriel Stanovsky
Venues:: GEM | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 511–517
Language:
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.gem-main.48/
DOI:
Bibkey:
Cite (ACL):: Sanket Badhe, Priyanka Tiwari, and Deep Shah. 2026. The Silent Vote: Improving Zero-Shot LLM Reliability by Aggregating Semantic Neighborhoods. In Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM), pages 511–517, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: The Silent Vote: Improving Zero-Shot LLM Reliability by Aggregating Semantic Neighborhoods (Badhe et al., GEM 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.gem-main.48.pdf

PDF Cite Search Fix data