Self-SoftCoT: A Self-Consistent Framework via Position-Aware Latent Space Reinforcement Learning

Liangliang Dong, Lianlei Shan, Shuaimin Li


Abstract
While Chain-of-Thought (CoT) reasoning empowers Large Language Models (LLMs) to tackle complex tasks, its reliance on discrete token decoding imposes an inherent Discreteness Bottleneck, limiting expressiveness within a restricted vocabulary space. Existing continuous reasoning approaches, such as SoftCoT, mitigate this but typically rely on external auxiliary models, resulting in complex deployment and fractured inference pipelines. To address these challenges, we propose Self-SoftCoT, a self-contained framework that enables a frozen LLM to internally generate and consume latent thoughts without external assistants. By establishing a single-stream "Thinking → Speaking" closed-loop, we decouple latent planning from explicit generation. Furthermore, we adopt Group Sequence Policy Optimization (GSPO) to stabilize learning and employ Position-Aware Independent Projection to mitigate representation homogenization. Experimental results on five reasoning benchmarks demonstrate that our method significantly improves the reasoning performance of frozen LLMs. Specifically, our Qwen2.5-based model uses only N=2 soft tokens to outperform the SoftCoT baseline (N=4), improving the average accuracy from 75.06% to 78.42%. Similarly, LLaMA-3.1 performance increases from 70.52% to 74.55%.
Anthology ID:
2026.acl-long.1496
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
32393–32414
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.1496/
DOI:
Bibkey:
Cite (ACL):
Liangliang Dong, Lianlei Shan, and Shuaimin Li. 2026. Self-SoftCoT: A Self-Consistent Framework via Position-Aware Latent Space Reinforcement Learning. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 32393–32414, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Self-SoftCoT: A Self-Consistent Framework via Position-Aware Latent Space Reinforcement Learning (Dong et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.1496.pdf
Checklist:
 2026.acl-long.1496.checklist.pdf