PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit

Hui Zhang, Tian Yuan, Junkun Chen, Xintong Li, Renjie Zheng, Yuxin Huang, Xiaojie Chen, Enlei Gong, Zeyu Chen, Xiaoguang Hu, Dianhai Yu, Yanjun Ma, Liang Huang


Abstract
PaddleSpeech is an open-source all-in-one speech toolkit. It aims at facilitating the development and research of speech processing technologies by providing an easy-to-use command-line interface and a simple code structure. This paper describes the design philosophy and core architecture of PaddleSpeech to support several essential speech-to-text and text-to-speech tasks. PaddleSpeech achieves competitive or state-of-the-art performance on various speech datasets and implements the most popular methods. It also provides recipes and pretrained models to quickly reproduce the experimental results in this paper. PaddleSpeech is publicly avaiable at https://github.com/PaddlePaddle/PaddleSpeech.
Anthology ID:
2022.naacl-demo.12
Volume:
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: System Demonstrations
Month:
July
Year:
2022
Address:
Hybrid: Seattle, Washington + Online
Editors:
Hannaneh Hajishirzi, Qiang Ning, Avi Sil
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
114–123
Language:
URL:
https://preview.aclanthology.org/icon-24-ingestion/2022.naacl-demo.12/
DOI:
10.18653/v1/2022.naacl-demo.12
Bibkey:
Cite (ACL):
Hui Zhang, Tian Yuan, Junkun Chen, Xintong Li, Renjie Zheng, Yuxin Huang, Xiaojie Chen, Enlei Gong, Zeyu Chen, Xiaoguang Hu, Dianhai Yu, Yanjun Ma, and Liang Huang. 2022. PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: System Demonstrations, pages 114–123, Hybrid: Seattle, Washington + Online. Association for Computational Linguistics.
Cite (Informal):
PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit (Zhang et al., NAACL 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/icon-24-ingestion/2022.naacl-demo.12.pdf
Video:
 https://preview.aclanthology.org/icon-24-ingestion/2022.naacl-demo.12.mp4
Code
 PaddlePaddle/PaddleSpeech +  additional community code
Data
AISHELL-1AISHELL-3AudioSetESC-50LJSpeechLibriSpeechMuST-CVoxCeleb1VoxCeleb2WenetSpeech