Igor Kiselev
2026
ThinkBooster: A Unified Framework for Seamless Test-Time Scaling of LLM Reasoning
Vladislav Smirnov | Quang-Chieu Nguyen | Sergey Senichev | Minh Ngoc Ta | Ekaterina Fadeeva | Artem Vazhentsev | Daria Galimzianova | Nikolai Rozanov | Viktor Mazanov | Jingwei Ni | Tianyi Wu | Igor Kiselev | Mrinmaya Sachan | Iryna Gurevych | Preslav Nakov | Timothy Baldwin | Artem Shelmanov
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Vladislav Smirnov | Quang-Chieu Nguyen | Sergey Senichev | Minh Ngoc Ta | Ekaterina Fadeeva | Artem Vazhentsev | Daria Galimzianova | Nikolai Rozanov | Viktor Mazanov | Jingwei Ni | Tianyi Wu | Igor Kiselev | Mrinmaya Sachan | Iryna Gurevych | Preslav Nakov | Timothy Baldwin | Artem Shelmanov
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Test-time compute (TTC) scaling has emerged as a powerful paradigm for improving large language model (LLM) reasoning by allocating additional compute during inference, e.g., via multi-sample generation and verifier-based reranking. Existing TTC scaling strategies and reasoning scorers remain fragmented, evaluated under inconsistent protocols, and are rarely analyzed through the lens of quality-cost trade-offs. We introduce ThinkBooster, a unified framework for seamless test-time compute scaling of LLM reasoning, which consists of (i) a modular Python library implementing state-of-the-art TTC scaling strategy and scorer families, (ii) a benchmark that jointly evaluates performance and computational efficiency, and (iii) a deployable OpenAI-compatible proxy service that enables drop-in integration of adaptive reasoning into real-world applications. We further provide a demo visual debugger for inspecting the reasoning trajectories, intermediate selection decisions, and alternative reasoning paths. Empirical results on mathematical and coding tasks reveal the performance-compute trade-offs of TTC scaling strategies and scoring methods and demonstrate that ThinkBooster provides practical gains in real-world tasks. The code is available online under an MIT license.
2025
A Head to Predict and a Head to Question: Pre-trained Uncertainty Quantification Heads for Hallucination Detection in LLM Outputs
Artem Shelmanov | Ekaterina Fadeeva | Akim Tsvigun | Ivan Tsvigun | Zhuohan Xie | Igor Kiselev | Nico Daheim | Caiqi Zhang | Artem Vazhentsev | Mrinmaya Sachan | Preslav Nakov | Timothy Baldwin
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Artem Shelmanov | Ekaterina Fadeeva | Akim Tsvigun | Ivan Tsvigun | Zhuohan Xie | Igor Kiselev | Nico Daheim | Caiqi Zhang | Artem Vazhentsev | Mrinmaya Sachan | Preslav Nakov | Timothy Baldwin
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
LLMs have the tendency to hallucinate, i.e., to sporadically generate false or fabricated information, and users generally lack the tools to detect when this happens. Uncertainty quantification (UQ) provides a framework for assessing the reliability of model outputs, aiding in the identification of potential hallucinations. In this work, we introduce pre-trained UQ heads: supervised auxiliary modules for LLMs that substantially enhance their ability to capture uncertainty compared to unsupervised UQ methods. Their strong performance stems from the transformer architecture in their design, in the form of informative features derived from LLM attention maps and logits. Our experiments show that these heads are highly robust and achieve state-of-the-art performance in claim-level hallucination detection across both in-domain and out-of-domain prompts. Moreover, these modules demonstrate strong generalization to languages they were not explicitly trained on. We pre-train a collection of UQ heads for popular LLM series, including Mistral, Llama, and Gemma. We publicly release both the code and the pre-trained heads.
Search
Fix author
Co-authors
- Artem Shelmanov 3
- Timothy Baldwin 2
- Ekaterina Fadeeva 2
- Preslav Nakov 2
- Mrinmaya Sachan 2
- Artem Vazhentsev 2
- Sergei Bratchikov 1
- Nico Daheim 1
- Daria Galimzianova 1
- Iryna Gurevych 1
- Konstantin Korolev 1
- Viktor Mazanov 1
- Quang-Chieu Nguyen 1
- Jingwei Ni 1
- Aleksandr Nikolich 1
- Nikolai Rozanov 1
- Sergey Senichev 1
- Vladislav Smirnov 1
- Minh Ngoc Ta 1
- Akim Tsvigun 1
- Ivan Tsvigun 1
- Tianyi Wu 1
- Zhuohan Xie 1
- Caiqi Zhang 1