Ram\'on Fernandez Astudillo

2026

Confidence-Weighted Token Set Cover for Early Hypothesis Pruning in Self-Consistency
Md Arafat Sultan | Ram\'on Fernandez Astudillo
Findings of the Association for Computational Linguistics: ACL 2026

Despite its simplicity and efficacy, the high token expenditure of self-consistency can limit its practical utility. We investigate whether early hypothesis pruning can improve the token efficiency of self-consistency for long chain-of-thought reasoning tasks, while preserving its parallelism. Concretely, we generate all solutions in parallel but periodically prune intermediate hypotheses based on two lightweight indicators: (a) the model’s confidence in each hypothesis, and (b) the lexical coverage of all current hypotheses by candidate subsets. We design a fast weighted set cover algorithm that utilizes the two indicators; evaluation of five LLMs on three math benchmarks shows that our method improves token efficiency in most cases, with reductions of 10-35% in many.

Co-authors

Md Arafat Sultan 1

Venues

Findings1

Fix author