Improving Simultaneous Translation by Incorporating Pseudo-References with Fewer Reorderings

Junkun Chen, Renjie Zheng, Atsuhito Kita, Mingbo Ma, Liang Huang


Abstract
Simultaneous translation is vastly different from full-sentence translation, in the sense that it starts translation before the source sentence ends, with only a few words delay. However, due to the lack of large-scale, high-quality simultaneous translation datasets, most such systems are still trained on conventional full-sentence bitexts. This is far from ideal for the simultaneous scenario due to the abundance of unnecessary long-distance reorderings in those bitexts. We propose a novel method that rewrites the target side of existing full-sentence corpora into simultaneous-style translation. Experiments on ZhEn and JaEn simultaneous translation show substantial improvements (up to +2.7 BLEU) with the addition of these generated pseudo-references.
Anthology ID:
2021.emnlp-main.473
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5857–5864
Language:
URL:
https://aclanthology.org/2021.emnlp-main.473
DOI:
10.18653/v1/2021.emnlp-main.473
Bibkey:
Cite (ACL):
Junkun Chen, Renjie Zheng, Atsuhito Kita, Mingbo Ma, and Liang Huang. 2021. Improving Simultaneous Translation by Incorporating Pseudo-References with Fewer Reorderings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5857–5864, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Improving Simultaneous Translation by Incorporating Pseudo-References with Fewer Reorderings (Chen et al., EMNLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2021.emnlp-main.473.pdf
Video:
 https://preview.aclanthology.org/ingestion-script-update/2021.emnlp-main.473.mp4