Without Further Ado: Direct and Simultaneous Speech Translation by AppTek in 2021

Parnia Bahar, Patrick Wilken, Mattia A. Di Gangi, Evgeny Matusov


Abstract
This paper describes the offline and simultaneous speech translation systems developed at AppTek for IWSLT 2021. Our offline ST submission includes the direct end-to-end system and the so-called posterior tight integrated model, which is akin to the cascade system but is trained in an end-to-end fashion, where all the cascaded modules are end-to-end models themselves. For simultaneous ST, we combine hybrid automatic speech recognition with a machine translation approach whose translation policy decisions are learned from statistical word alignments. Compared to last year, we improve general quality and provide a wider range of quality/latency trade-offs, both due to a data augmentation method making the MT model robust to varying chunk sizes. Finally, we present a method for ASR output segmentation into sentences that introduces a minimal additional delay.
Anthology ID:
2021.iwslt-1.5
Volume:
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)
Month:
August
Year:
2021
Address:
Bangkok, Thailand (online)
Venue:
IWSLT
SIG:
SIGSLT
Publisher:
Association for Computational Linguistics
Note:
Pages:
52–63
Language:
URL:
https://aclanthology.org/2021.iwslt-1.5
DOI:
10.18653/v1/2021.iwslt-1.5
Bibkey:
Cite (ACL):
Parnia Bahar, Patrick Wilken, Mattia A. Di Gangi, and Evgeny Matusov. 2021. Without Further Ado: Direct and Simultaneous Speech Translation by AppTek in 2021. In Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021), pages 52–63, Bangkok, Thailand (online). Association for Computational Linguistics.
Cite (Informal):
Without Further Ado: Direct and Simultaneous Speech Translation by AppTek in 2021 (Bahar et al., IWSLT 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2021.iwslt-1.5.pdf