Fairseq S2T: Fast Speech-to-Text Modeling with Fairseq
Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino
Abstract
We introduce fairseq S2T, a fairseq extension for speech-to-text (S2T) modeling tasks such as end-to-end speech recognition and speech-to-text translation. It follows fairseq’s careful design for scalability and extensibility. We provide end-to-end workflows from data pre-processing, model training to offline (online) inference. We implement state-of-the-art RNN-based as well as Transformer-based models and open-source detailed training recipes. Fairseq’s machine translation models and language models can be seamlessly integrated into S2T workflows for multi-task learning or transfer learning. Fairseq S2T is available at https://github.com/pytorch/fairseq/tree/master/examples/speech_to_text.- Anthology ID:
- 2020.aacl-demo.6
- Volume:
- Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing: System Demonstrations
- Month:
- December
- Year:
- 2020
- Address:
- Suzhou, China
- Editors:
- Derek Wong, Douwe Kiela
- Venue:
- AACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 33–39
- Language:
- URL:
- https://aclanthology.org/2020.aacl-demo.6
- DOI:
- Cite (ACL):
- Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, and Juan Pino. 2020. Fairseq S2T: Fast Speech-to-Text Modeling with Fairseq. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing: System Demonstrations, pages 33–39, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- Fairseq S2T: Fast Speech-to-Text Modeling with Fairseq (Wang et al., AACL 2020)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2020.aacl-demo.6.pdf
- Code
- pytorch/fairseq + additional community code
- Data
- CoVoST2, LibriSpeech, MuST-C