NeMo@IWSLT 2026: Cascaded System for Simultaneous Speech Translation

Lilit Grigoryan; Vladimir Bataev; Andrei Andrusenko; Oleksii Hrinchuk; Davit Karamyan; Enas Albasiri; Vitaly Lavrukhin; Nikolay Karpov; Boris Ginsburg

doi:10.18653/v1/2026.iwslt-1.23

NeMo@IWSLT 2026: Cascaded System for Simultaneous Speech Translation

Lilit Grigoryan, Vladimir Bataev, Andrei Andrusenko, Oleksii Hrinchuk, Davit Karamyan, Enas Albasiri, Vitaly Lavrukhin, Nikolay Karpov, Boris Ginsburg

Abstract

This paper describes the NVIDIA NeMo team’s submission to the IWSLT 2026 Simultaneous Speech Translation (SimulST) tracks. We use a cascaded architecture combining a dual-mode Unified ASR Transducer model with a multilingual Large Language Model (LLM). The ASR is trained to deliver stable transcriptions across wide range of latencies, providing a reliable foundation for high-quality LLM translation. Our submission participates in the English–German, English–Italian, and English–Chinese tasks, in both standard and contextualized settings, as well as the Czech–English standard track, covering both low- and high-latency scenarios. We further analyze how ASR and LLM design choices affect the system’s overall latency and translation quality.

Anthology ID:: 2026.iwslt-1.23
Volume:: Proceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026)
Month:: July
Year:: 2026
Address:: San Diego, USA (in-person and online)
Editors:: Elizabeth Salesky, Antonios Anastasopoulos, Matteo Negri, Marcello Federico
Venues:: IWSLT | WS
SIG:: SIGSLT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 204–211
Language:
URL:: https://preview.aclanthology.org/corrections-2026-06/2026.iwslt-1.23/
DOI:: 10.18653/v1/2026.iwslt-1.23
Bibkey:
Cite (ACL):: Lilit Grigoryan, Vladimir Bataev, Andrei Andrusenko, Oleksii Hrinchuk, Davit Karamyan, Enas Albasiri, Vitaly Lavrukhin, Nikolay Karpov, and Boris Ginsburg. 2026. NeMo@IWSLT 2026: Cascaded System for Simultaneous Speech Translation. In Proceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026), pages 204–211, San Diego, USA (in-person and online). Association for Computational Linguistics.
Cite (Informal):: NeMo@IWSLT 2026: Cascaded System for Simultaneous Speech Translation (Grigoryan et al., IWSLT 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/corrections-2026-06/2026.iwslt-1.23.pdf

PDF Cite Search Fix data