Kaixuan Huang
2025
TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling
Jiahao Qiu
|
Yifu Lu
|
Yifan Zeng
|
Jiacheng Guo
|
Jiayi Geng
|
Chenhao Zhu
|
Xinzhe Juan
|
Ling Yang
|
Huazheng Wang
|
Kaixuan Huang
|
Yue Wu
|
Mengdi Wang
Findings of the Association for Computational Linguistics: EMNLP 2025
Inference-time alignment enhances the performance of large language models without requiring additional training or fine-tuning but presents challenges due to balancing computational efficiency with high-quality output. Best-of-N (BoN) sampling, as a simple yet powerful approach, generates multiple responses and selects the best one, achieving improved performance but with a high computational cost. We propose TreeBoN, a novel framework that integrates a speculative tree-search strategy into Best-of-N (BoN) Sampling. TreeBoN maintains a set of parent nodes, iteratively branching and pruning low-quality responses, thereby reducing computational overhead while maintaining high output quality. Our approach also leverages token-level rewards from Direct Preference Optimization (DPO) to guide tree expansion and prune low-quality paths. We evaluate TreeBoN using AlpacaFarm, UltraFeedback, GSM8K, HH-RLHF, and TutorEval datasets, demonstrating consistent improvements. Specifically, TreeBoN achieves a 65% win rate at maximum lengths of 192 and 384 tokens, outperforming standard BoN with the same computational cost. Furthermore, TreeBoN achieves around a 60% win rate across longer responses, showcasing its scalability and alignment efficacy.
Temporal Consistency for LLM Reasoning Process Error Identification
Jiacheng Guo
|
Yue Wu
|
Jiahao Qiu
|
Kaixuan Huang
|
Xinzhe Juan
|
Ling Yang
|
Mengdi Wang
Findings of the Association for Computational Linguistics: EMNLP 2025
Verification is crucial for effective mathematical reasoning. We present a new temporal consistency method where verifiers iteratively refine their judgments based on the previous assessment. Unlike one-round verification or multi-model debate approaches, our method leverages consistency in a sequence of self-reflection actions to improve verification accuracy. Empirical evaluations across diverse mathematical process error identification benchmarks (Mathcheck, ProcessBench, and PRM800K) show consistent performance improvements over baseline methods. When applied to the recent DeepSeek R1 distilled models, our method demonstrates strong performance, enabling 7B/8B distilled models to outperform all 70B/72B models and GPT-4o on ProcessBench. Notably, the distilled 14B model with our method achieves performance comparable to Deepseek-R1.
Search
Fix author
Co-authors
- Jiacheng Guo 2
- Xinzhe Juan 2
- Jiahao Qiu 2
- Mengdi Wang 2
- Yue Wu 2
- show all...