JoeyS2T: Minimalistic Speech-to-Text Modeling with JoeyNMT

Mayumi Ohta; Julia Kreutzer; Stefan Riezler

doi:10.18653/v1/2022.emnlp-demos.6

JoeyS2T: Minimalistic Speech-to-Text Modeling with JoeyNMT

Mayumi Ohta, Julia Kreutzer, Stefan Riezler

Abstract

JoeyS2T is a JoeyNMT extension for speech-to-text tasks such as automatic speech recognition and end-to-end speech translation. It inherits the core philosophy of JoeyNMT, a minimalist NMT toolkit built on PyTorch, seeking simplicity and accessibility. JoeyS2T’s workflow is self-contained, starting from data pre-processing, over model training and prediction to evaluation, and is seamlessly integrated into JoeyNMT’s compact and simple code base. On top of JoeyNMT’s state-of-the-art Transformer-based Encoder-Decoder architecture, JoeyS2T provides speech-oriented components such as convolutional layers, SpecAugment, CTC-loss, and WER evaluation. Despite its simplicity compared to prior implementations, JoeyS2T performs competitively on English speech recognition and English-to-German speech translation benchmarks. The implementation is accompanied by a walk-through tutorial and available on https://github.com/may-/joeys2t.

Anthology ID:: 2022.emnlp-demos.6
Volume:: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Month:: December
Year:: 2022
Address:: Abu Dhabi, UAE
Editors:: Wanxiang Che, Ekaterina Shutova
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 50–59
Language:
URL:: https://preview.aclanthology.org/moar-dois/2022.emnlp-demos.6/
DOI:: 10.18653/v1/2022.emnlp-demos.6
Bibkey:
Cite (ACL):: Mayumi Ohta, Julia Kreutzer, and Stefan Riezler. 2022. JoeyS2T: Minimalistic Speech-to-Text Modeling with JoeyNMT. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 50–59, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: JoeyS2T: Minimalistic Speech-to-Text Modeling with JoeyNMT (Ohta et al., EMNLP 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/moar-dois/2022.emnlp-demos.6.pdf

PDF Cite Search Fix data