Better Simultaneous Translation with Monotonic Knowledge Distillation

Shushu Wang, Jing Wu, Kai Fan, Wei Luo, Jun Xiao, Zhongqiang Huang


Abstract
Simultaneous machine translation (SiMT) presents a unique challenge as it requires generating target tokens before the source sentence is fully consumed. This can lead to the hallucination problem, where target tokens are generated without support from the source sentence. The prefix-to-prefix training data used to train SiMT models are not always parallel, due to divergent word order between the source and target languages, and can contribute to the problem. In this paper, we propose a novel approach that leverages traditional translation models as teachers and employs a two-stage beam search algorithm to generate monotonic yet accurate reference translations for sequence-level knowledge distillation. Experimental results demonstrate the significant improvements achieved by our approach over multiple strong SiMT baselines, leading to new state-of-the-art performance across various language pairs. Notably, when evaluated on a monotonic version of the WMT15 De-En test set, which includes references generated in a more monotonic style by professional translators, our approach achieves even more substantial improvement over the baselines. The source code and data are publicly available for further exploration.
Anthology ID:
2023.acl-long.131
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2334–2349
Language:
URL:
https://aclanthology.org/2023.acl-long.131
DOI:
10.18653/v1/2023.acl-long.131
Bibkey:
Cite (ACL):
Shushu Wang, Jing Wu, Kai Fan, Wei Luo, Jun Xiao, and Zhongqiang Huang. 2023. Better Simultaneous Translation with Monotonic Knowledge Distillation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2334–2349, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Better Simultaneous Translation with Monotonic Knowledge Distillation (Wang et al., ACL 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/2023.acl-long.131.pdf
Video:
 https://preview.aclanthology.org/nschneid-patch-2/2023.acl-long.131.mp4