ThinkBooster: A Unified Framework for Seamless Test-Time Scaling of LLM Reasoning

Vladislav Smirnov; Quang-Chieu Nguyen; Sergey Senichev; Minh Ngoc Ta; Ekaterina Fadeeva; Artem Vazhentsev; Daria Galimzianova; Nikolai Rozanov; Viktor Mazanov; Jingwei Ni; Tianyi Wu; Igor Kiselev; Mrinmaya Sachan; Iryna Gurevych; Preslav Nakov; Timothy Baldwin; Artem Shelmanov

ThinkBooster: A Unified Framework for Seamless Test-Time Scaling of LLM Reasoning

Vladislav Smirnov, Quang-Chieu Nguyen, Sergey Senichev, Minh Ngoc Ta, Ekaterina Fadeeva, Artem Vazhentsev, Daria Galimzianova, Nikolai Rozanov, Viktor Mazanov, Jingwei Ni, Tianyi Wu, Igor Kiselev, Mrinmaya Sachan, Iryna Gurevych, Preslav Nakov, Timothy Baldwin, Artem Shelmanov

Abstract

Test-time compute (TTC) scaling has emerged as a powerful paradigm for improving large language model (LLM) reasoning by allocating additional compute during inference, e.g., via multi-sample generation and verifier-based reranking. Existing TTC scaling strategies and reasoning scorers remain fragmented, evaluated under inconsistent protocols, and are rarely analyzed through the lens of quality-cost trade-offs. We introduce ThinkBooster, a unified framework for seamless test-time compute scaling of LLM reasoning, which consists of (i) a modular Python library implementing state-of-the-art TTC scaling strategy and scorer families, (ii) a benchmark that jointly evaluates performance and computational efficiency, and (iii) a deployable OpenAI-compatible proxy service that enables drop-in integration of adaptive reasoning into real-world applications. We further provide a demo visual debugger for inspecting the reasoning trajectories, intermediate selection decisions, and alternative reasoning paths. Empirical results on mathematical and coding tasks reveal the performance-compute trade-offs of TTC scaling strategies and scoring methods and demonstrate that ThinkBooster provides practical gains in real-world tasks. The code is available online under an MIT license.

Anthology ID:: 2026.acl-demo.70
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Greg Durrett, Ping Jian
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 715–727
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-demo.70/
DOI:
Bibkey:
Cite (ACL):: Vladislav Smirnov, Quang-Chieu Nguyen, Sergey Senichev, Minh Ngoc Ta, Ekaterina Fadeeva, Artem Vazhentsev, Daria Galimzianova, Nikolai Rozanov, Viktor Mazanov, Jingwei Ni, Tianyi Wu, Igor Kiselev, Mrinmaya Sachan, Iryna Gurevych, Preslav Nakov, Timothy Baldwin, and Artem Shelmanov. 2026. ThinkBooster: A Unified Framework for Seamless Test-Time Scaling of LLM Reasoning. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pages 715–727, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: ThinkBooster: A Unified Framework for Seamless Test-Time Scaling of LLM Reasoning (Smirnov et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-demo.70.pdf

PDF Cite Search Fix data