Alibaba Speech Translation Systems for IWSLT 2018

Nguyen Bach, Hongjie Chen, Kai Fan, Cheung-Chi Leung, Bo Li, Chongjia Ni, Rong Tong, Pei Zhang, Boxing Chen, Bin Ma, Fei Huang


Abstract
This work describes the En→De Alibaba speech translation system developed for the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT) 2018. In order to improve ASR performance, multiple ASR models including conventional and end-to-end models are built, then we apply model fusion in the final step. ASR pre and post-processing techniques such as speech segmentation, punctuation insertion, and sentence splitting are found to be very useful for MT. We also employed most techniques that have proven effective during the WMT 2018 evaluation, such as BPE, back translation, data selection, model ensembling and reranking. These ASR and MT techniques, combined, improve the speech translation quality significantly.
Anthology ID:
2018.iwslt-1.20
Volume:
Proceedings of the 15th International Conference on Spoken Language Translation
Month:
October 29-30
Year:
2018
Address:
Brussels
Venue:
IWSLT
SIG:
SIGSLT
Publisher:
International Conference on Spoken Language Translation
Note:
Pages:
136–141
Language:
URL:
https://aclanthology.org/2018.iwslt-1.20
DOI:
Bibkey:
Cite (ACL):
Nguyen Bach, Hongjie Chen, Kai Fan, Cheung-Chi Leung, Bo Li, Chongjia Ni, Rong Tong, Pei Zhang, Boxing Chen, Bin Ma, and Fei Huang. 2018. Alibaba Speech Translation Systems for IWSLT 2018. In Proceedings of the 15th International Conference on Spoken Language Translation, pages 136–141, Brussels. International Conference on Spoken Language Translation.
Cite (Informal):
Alibaba Speech Translation Systems for IWSLT 2018 (Bach et al., IWSLT 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2018.iwslt-1.20.pdf