Dual-Reasoner: Bridging Interleaved Atomicity and Streaming Latency via Thinking-while-Talking

Yangzhuo Li, Shengpeng Ji, Yifu Chen, Tianle Liang, Haoyu Yang, Junboli, Jun Fang, Lin Li, Qingyang Hong


Abstract
Integrating explicit Chain-of-Thought (CoT) into end-to-end spoken dialogue models enhances intelligence but incurs prohibitive latency. While the "Thinking-while-Talking" paradigm alleviates this delay, it fundamentally compromises block atomicity, severing the logical connection between interleaved thought and speech. To address this, we present Dual-Reasoner, employing a Streaming Masking Mechanism underpinned by our Dual-Think-30k dataset to guarantee uninterrupted audio streaming. Crucially, to strictly align the fragmented thinking blocks to service speech generation, we introduce the Atomic-Consistency Restoration framework. To secure comprehensive capabilities in high-difficulty reasoning, this mechanism utilizes a quadruple-constraint system to reconstruct logical atomicity, ensuring that "think" chunks act as a rigorous anchor for "talk" outputs. Experimental results demonstrate that Dual-Reasoner achieves comprehensive reasoning enhancements within ultra-low latency constraints: it elevates the VoiceBench score from 67.24 to 73.41 over the baseline, while significantly reducing the Time-to-First-Audio (TTFA) from 20.35s to 3.65s and the Real-Time Factor (RTF) from 7.04 to 1.05.
Anthology ID:
2026.findings-acl.199
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4081–4105
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.199/
DOI:
Bibkey:
Cite (ACL):
Yangzhuo Li, Shengpeng Ji, Yifu Chen, Tianle Liang, Haoyu Yang, Junboli, Jun Fang, Lin Li, and Qingyang Hong. 2026. Dual-Reasoner: Bridging Interleaved Atomicity and Streaming Latency via Thinking-while-Talking. In Findings of the Association for Computational Linguistics: ACL 2026, pages 4081–4105, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Dual-Reasoner: Bridging Interleaved Atomicity and Streaming Latency via Thinking-while-Talking (Li et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.199.pdf
Checklist:
 2026.findings-acl.199.checklist.pdf