Accelerate Parallelizable Reasoning via Parallel Decoding within One Sequence

Yijiong Yu, Wei Wang, Ran Chen, Ji Pei


Abstract
Recent advances in reasoning models have demonstrated significant improvements in accuracy by employing detailed and comprehensive reasoning processes. However, generating these lengthy reasoning sequences is computationally expensive and time-consuming. To address this inefficiency, we leverage the inherent parallelizability of certain tasks to accelerate the reasoning process. Specifically, when multiple parallel reasoning steps exist, we decode multiple tokens per forward pass via a tree-like attention mask within a single sequence, avoiding additional memory usage. Experimental results show that our method achieves up to nearly 100% speedup in decoding while basically maintaining the answer quality. Our code is available in https://github.com/yuyijiong/parallel-decoding-in-one-sequence
Anthology ID:
2025.emnlp-main.457
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9018–9025
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.457/
DOI:
Bibkey:
Cite (ACL):
Yijiong Yu, Wei Wang, Ran Chen, and Ji Pei. 2025. Accelerate Parallelizable Reasoning via Parallel Decoding within One Sequence. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 9018–9025, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Accelerate Parallelizable Reasoning via Parallel Decoding within One Sequence (Yu et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.457.pdf
Checklist:
 2025.emnlp-main.457.checklist.pdf