WeTS: A Benchmark for Translation Suggestion

Zhen Yang, Fandong Meng, Yingxue Zhang, Ernan Li, Jie Zhou


Abstract
Translation suggestion (TS), which provides alternatives for specific words or phrases given the entire documents generated by machine translation (MT), has been proven to play a significant role in post-editing (PE). There are two main pitfalls for existing researches in this line. First, most conventional works only focus on the overall performance of PE but ignore the exact performance of TS, which makes the progress of PE sluggish and less explainable; Second, as no publicly available golden dataset exists to support in-depth research for TS, almost all of the previous works conduct experiments on their in-house datasets or the noisy datasets built automatically, which makes their experiments hard to be reproduced and compared. To break these limitations mentioned above and spur the research in TS, we create a benchmark dataset, called WeTS, which is a golden corpus annotated by expert translators on four translation directions. Apart from the golden corpus, we also propose several methods to generate synthetic corpora which can be used to improve the performance substantially through pre-training. As for the model, we propose the segment-aware self-attention based Transformer for TS. Experimental results show that our approach achieves the best results on all four directions, including English-to-German, German-to-English, Chinese-to-English, and English-to-Chinese.
Anthology ID:
2022.emnlp-main.353
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5278–5290
Language:
URL:
https://aclanthology.org/2022.emnlp-main.353
DOI:
Bibkey:
Cite (ACL):
Zhen Yang, Fandong Meng, Yingxue Zhang, Ernan Li, and Jie Zhou. 2022. WeTS: A Benchmark for Translation Suggestion. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 5278–5290, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
WeTS: A Benchmark for Translation Suggestion (Yang et al., EMNLP 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.emnlp-main.353.pdf