Boris Ginsburg
2023
NVIDIA NeMo Offline Speech Translation Systems for IWSLT 2023
Oleksii Hrinchuk
|
Vladimir Bataev
|
Evelina Bakhturina
|
Boris Ginsburg
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)
This paper provides an overview of NVIDIA NeMo’s speech translation systems for the IWSLT 2023 Offline Speech Translation Task. This year, we focused on end-to-end system which capitalizes on pre-trained models and synthetic data to mitigate the problem of direct speech translation data scarcity. When trained on IWSLT 2022 constrained data, our best En->De end-to-end model achieves the average score of 31 BLEU on 7 test sets from IWSLT 2010-2020 which improves over our last year cascade (28.4) and end-to-end (25.7) submissions. When trained on IWSLT 2023 constrained data, the average score drops to 29.5 BLEU.
2018
OpenSeq2Seq: Extensible Toolkit for Distributed and Mixed Precision Training of Sequence-to-Sequence Models
Oleksii Kuchaiev
|
Boris Ginsburg
|
Igor Gitman
|
Vitaly Lavrukhin
|
Carl Case
|
Paulius Micikevicius
Proceedings of Workshop for NLP Open Source Software (NLP-OSS)
We present OpenSeq2Seq – an open-source toolkit for training sequence-to-sequence models. The main goal of our toolkit is to allow researchers to most effectively explore different sequence-to-sequence architectures. The efficiency is achieved by fully supporting distributed and mixed-precision training. OpenSeq2Seq provides building blocks for training encoder-decoder models for neural machine translation and automatic speech recognition. We plan to extend it with other modalities in the future.
Search
Co-authors
- Oleksii Kuchaiev 1
- Igor Gitman 1
- Vitaly Lavrukhin 1
- Carl Case 1
- Paulius Micikevicius 1
- show all...