Qingjie Li

2026

ConMA : Confidence-Guided Kernel Sampling with Multi-Stage Aggregation for LLM Reasoning
Yinuo Wang | Qingjie Li | Wenyao Cui | Qiuchi Li | Zhang Huaping
Findings of the Association for Computational Linguistics: ACL 2026

Test-time scaling (TTS) enhances LLM reasoning capabilities by sampling and aggregating diverse solution trajectories. However, existing approaches often rely on external verifiers and one-shot independent sampling, which results in inefficient budget allocation and underutilizes interim high-quality trajectories. We propose ConMA, a training-free, verifier-free TTS framework that reallocates a fixed inference budget into iterative sample–filter–diversify–select cycles: it filters answer groups based on intrinsic token-probability confidence, enriches candidates through diversity-aware expansion, and employs repeated single-choice selection for multi-stage refinement. Across multiple benchmarks, ConMA consistently improves accuracy under fixed budgets. With a maximum budget of N=64, ConMA boosts Qwen3-4B to 80% accuracy on AIME25, significantly outperforming strong baselines while converging early with only 18 samples on average, substantially reducing inference cost.

Co-authors

Venues

Findings1

Fix author