Fairseq S2T: Fast Speech-to-Text Modeling with Fairseq

Changhan Wang; Yun Tang; Xutai Ma; Anne Wu; Dmytro Okhonko; Juan Pino

Fairseq S2T: Fast Speech-to-Text Modeling with Fairseq

Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino

Abstract

We introduce fairseq S2T, a fairseq extension for speech-to-text (S2T) modeling tasks such as end-to-end speech recognition and speech-to-text translation. It follows fairseq’s careful design for scalability and extensibility. We provide end-to-end workflows from data pre-processing, model training to offline (online) inference. We implement state-of-the-art RNN-based as well as Transformer-based models and open-source detailed training recipes. Fairseq’s machine translation models and language models can be seamlessly integrated into S2T workflows for multi-task learning or transfer learning. Fairseq S2T is available at https://github.com/pytorch/fairseq/tree/master/examples/speech_to_text.

Anthology ID:: 2020.aacl-demo.6
Volume:: Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing: System Demonstrations
Month:: December
Year:: 2020
Address:: Suzhou, China
Editors:: Derek Wong, Douwe Kiela
Venue:: AACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 33–39
Language:
URL:: https://aclanthology.org/2020.aacl-demo.6
DOI:
Bibkey:
Cite (ACL):: Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, and Juan Pino. 2020. Fairseq S2T: Fast Speech-to-Text Modeling with Fairseq. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing: System Demonstrations, pages 33–39, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Fairseq S2T: Fast Speech-to-Text Modeling with Fairseq (Wang et al., AACL 2020)
Copy Citation:
PDF:: https://preview.aclanthology.org/nschneid-patch-4/2020.aacl-demo.6.pdf
Code: pytorch/fairseq + additional community code
Data: CoVoST2, LibriSpeech, MuST-C

PDF Search Code