Improving Simultaneous Translation by Incorporating Pseudo-References with Fewer Reorderings
Junkun Chen, Renjie Zheng, Atsuhito Kita, Mingbo Ma, Liang Huang
Abstract
Simultaneous translation is vastly different from full-sentence translation, in the sense that it starts translation before the source sentence ends, with only a few words delay. However, due to the lack of large-scale, high-quality simultaneous translation datasets, most such systems are still trained on conventional full-sentence bitexts. This is far from ideal for the simultaneous scenario due to the abundance of unnecessary long-distance reorderings in those bitexts. We propose a novel method that rewrites the target side of existing full-sentence corpora into simultaneous-style translation. Experiments on Zh→En and Ja→En simultaneous translation show substantial improvements (up to +2.7 BLEU) with the addition of these generated pseudo-references.- Anthology ID:
- 2021.emnlp-main.473
- Volume:
- Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2021
- Address:
- Online and Punta Cana, Dominican Republic
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 5857–5864
- Language:
- URL:
- https://aclanthology.org/2021.emnlp-main.473
- DOI:
- 10.18653/v1/2021.emnlp-main.473
- Cite (ACL):
- Junkun Chen, Renjie Zheng, Atsuhito Kita, Mingbo Ma, and Liang Huang. 2021. Improving Simultaneous Translation by Incorporating Pseudo-References with Fewer Reorderings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5857–5864, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cite (Informal):
- Improving Simultaneous Translation by Incorporating Pseudo-References with Fewer Reorderings (Chen et al., EMNLP 2021)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2021.emnlp-main.473.pdf