Junhan Shi


2025

pdf bib
Reasoning under Uncertainty: Efficient LLM Inference via Unsupervised Confidence Dilution and Convergent Adaptive Sampling
Zhenning Shi | Yijia Zhu | Yi Xie | Junhan Shi | Guorui Xie | Haotian Zhang | Yong Jiang | Congcong Miao | Qing Li
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Large language models (LLMs) excel at complex reasoning tasks but often suffer from overconfidence and computational inefficiency due to fixed computation budgets and miscalibrated confidence estimates. We present a novel framework for computationally efficient, trustworthy reasoning under uncertainty, introducing two complementary techniques: Diversity-Aware Self-Signal Dilution (DASD) and Convergent Adaptive Weighted Sampling (CAWS). DASD operates in an unsupervised manner to dilute overconfident, semantically redundant reasoning paths, thereby producing better-calibrated internal confidence estimates. CAWS dynamically allocates computational resources at inference time by aggregating these signals and terminating computation once answer dominance and stability are achieved. Comprehensive experiments across three reasoning datasets demonstrate that our approach maintains accuracy levels while achieving over 70% reduction in inference cost, surpassing competitive baselines. Our framework provides a scalable, unsupervised solution for reliable and efficient LLM reasoning.

pdf bib
SpecCoT: Accelerating Chain-of-Thought Reasoning through Speculative Exploration
Junhan Shi | Yijia Zhu | Zhenning Shi | Dan Zhao | Qing Li | Yong Jiang
Findings of the Association for Computational Linguistics: EMNLP 2025

Large Reasoning Models (LRMs) demonstrate strong performance on complex tasks through chain-of-thought (CoT) reasoning. However, they suffer from high inference latency due to lengthy reasoning chains. In this paper, we propose SpecCoT, a collaborative framework that combines large and small models for effective yet efficient reasoning. Unlike traditional speculative decoding, which operates at the token level, SpecCoT adopts a step-level verification strategy: the large model first establishes the reasoning direction, and for each intermediate step, the small model generates multiple candidate drafts in parallel. The large model then verifies these drafts, either selecting the most suitable one or rejecting them all and generating its own. SpecCoT approach balances reasoning quality with inference efficiency through fine-grained model cooperation. Experiments across diverse tasks show SpecCoT reduces inference latency by 1.7-4.1× while maintaining comparable accuracy to standard large model inference.