Towards Inference-time Scaling for Continuous Space Reasoning

Minghan Wang; Thuy Vu; Ehsan Shareghi; Gholamreza Haffari

Towards Inference-time Scaling for Continuous Space Reasoning

Minghan Wang, Thuy-Trang Vu, Ehsan Shareghi, Gholamreza Haffari

Abstract

Inference-time scaling through multiple sample generation in combination with Process- or Outcome-Reward Model (PRM or ORM) re-ranking has proven effective for text-based reasoning in large language models. This paper investigates whether such established techniques can be successfully adapted to reasoning in the continuous space, using COCONUT (CITATION) continuous space reasoning LM as the backbone. We demonstrate the feasibility of generating diverse reasoning paths through dropout-based sampling. Our Pass@N analysis on the generated samples reveals the potential that could enable a significant gain in performance akin to gains observed in the discrete space. However, we highlight unique challenges faced for materializing this gain in the continuous thought space. In particular, working recipes for data generation and training PRM and ORM models in the discrete space unlocks only marginal improvements in the continuous space. Through probing various aspects including geometric properties and trajectory dynamics, we identify the underlying reasons that prevent effective discrimination between correct and incorrect reasoning (essential for the functioning of PRM and ORM). Our findings reveal that current limitations stem from the absence of key inductive biases in continuous thought representations.

Anthology ID:: 2026.findings-acl.1338
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 26842–26856
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1338/
DOI:
Bibkey:
Cite (ACL):: Minghan Wang, Thuy-Trang Vu, Ehsan Shareghi, and Gholamreza Haffari. 2026. Towards Inference-time Scaling for Continuous Space Reasoning. In Findings of the Association for Computational Linguistics: ACL 2026, pages 26842–26856, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Towards Inference-time Scaling for Continuous Space Reasoning (Wang et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1338.pdf
Checklist:: 2026.findings-acl.1338.checklist.pdf

PDF Cite Search Checklist Fix data