SAC: Neural Speech Codec with Semantic-Acoustic Dual-Stream Quantization

Wenxi Chen, Ruiqi Yan, Yushen Chen, Zhikang Niu, Ziyang Ma, Xiquan Li, Yuzhe Liang, Wenhanlin, Shunshun Yin, Ming Tao, Xinsheng Wang, Xie Chen


Abstract
Speech codecs that convert continuous speech signals into discrete tokens have become essential for speech language models. However, existing codecs struggle to balance high-quality reconstruction with semantically rich representations, limiting their effectiveness in both generative and understanding tasks. In this work, we propose SAC, a neural speech codec with semantic-acoustic dual-stream quantization. By disentangling semantic and acoustic modeling into two dedicated streams, SAC enables each to be optimized for its respective role. Comprehensive evaluations show that SAC achieves strong reconstruction performance across diverse bitrates under both clean and noisy conditions, with particularly high scores on UTMOS and WER, indicating superior naturalness and intelligibility. Moreover, SAC substantially surpasses prior codecs in semantic representation, approaching the level of continuous self-supervised embeddings. When used as a tokenizer for LLM-based text-to-speech, SAC enables a single-stage autoregressive (AR) TTS model that clearly outperforms state-of-the-art AR systems. Our disentanglement analysis further validates the effectiveness of the dual-stream design, offering new potential for controllable speech generation.
Anthology ID:
2026.acl-long.138
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3030–3048
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.138/
DOI:
Bibkey:
Cite (ACL):
Wenxi Chen, Ruiqi Yan, Yushen Chen, Zhikang Niu, Ziyang Ma, Xiquan Li, Yuzhe Liang, Wenhanlin, Shunshun Yin, Ming Tao, Xinsheng Wang, and Xie Chen. 2026. SAC: Neural Speech Codec with Semantic-Acoustic Dual-Stream Quantization. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3030–3048, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
SAC: Neural Speech Codec with Semantic-Acoustic Dual-Stream Quantization (Chen et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.138.pdf
Checklist:
 2026.acl-long.138.checklist.pdf