Continuous Speech Tokenizer in Text To Speech

Yixing Li; Ruobing Xie; Xingwu Sun; Yu Cheng; Zhanhui Kang

Continuous Speech Tokenizer in Text To Speech

Yixing Li, Ruobing Xie, Xingwu Sun, Yu Cheng, Zhanhui Kang

Abstract

The fusion of speech and language in the era of large language models has garnered significant attention. Discrete speech token is often utilized in text-to-speech tasks for speech compression and portability, which is convenient for joint training with text and have good compression efficiency. However, we found that the discrete speech tokenizer still suffers from information loss. Therefore, we propose a simple yet effective continuous speech tokenizer named Cont-SPT, and a text-to-speech model based on continuous speech tokens. Our results show that the speech language model based on the continuous speech tokenizer has better continuity and higher estimated Mean Opinion Scores (MoS). This enhancement is attributed to better information preservation rate of the continuous speech tokenizer across both low and high frequencies in the frequency domain. The code and resources for Cont-SPT can be found in https://github.com/Yixing-Li/Continuous-Speech-Tokenizer.

Anthology ID:: 2025.findings-naacl.184
Volume:: Findings of the Association for Computational Linguistics: NAACL 2025
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3341–3347
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2025.findings-naacl.184/
DOI:
Bibkey:
Cite (ACL):: Yixing Li, Ruobing Xie, Xingwu Sun, Yu Cheng, and Zhanhui Kang. 2025. Continuous Speech Tokenizer in Text To Speech. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 3341–3347, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: Continuous Speech Tokenizer in Text To Speech (Li et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2025.findings-naacl.184.pdf

PDF Cite Search Fix data