Fairseq S2T: Fast Speech-to-Text Modeling with Fairseq

Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino


Abstract
We introduce fairseq S2T, a fairseq extension for speech-to-text (S2T) modeling tasks such as end-to-end speech recognition and speech-to-text translation. It follows fairseq’s careful design for scalability and extensibility. We provide end-to-end workflows from data pre-processing, model training to offline (online) inference. We implement state-of-the-art RNN-based as well as Transformer-based models and open-source detailed training recipes. Fairseq’s machine translation models and language models can be seamlessly integrated into S2T workflows for multi-task learning or transfer learning. Fairseq S2T is available at https://github.com/pytorch/fairseq/tree/master/examples/speech_to_text.
Anthology ID:
2020.aacl-demo.6
Volume:
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing: System Demonstrations
Month:
December
Year:
2020
Address:
Suzhou, China
Editors:
Derek Wong, Douwe Kiela
Venue:
AACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
33–39
Language:
URL:
https://aclanthology.org/2020.aacl-demo.6
DOI:
Bibkey:
Cite (ACL):
Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, and Juan Pino. 2020. Fairseq S2T: Fast Speech-to-Text Modeling with Fairseq. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing: System Demonstrations, pages 33–39, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Fairseq S2T: Fast Speech-to-Text Modeling with Fairseq (Wang et al., AACL 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2020.aacl-demo.6.pdf
Code
 pytorch/fairseq +  additional community code
Data
CoVoST2LibriSpeechMuST-C