Wenxuan Wang
2022
Tencent’s Multilingual Machine Translation System for WMT22 Large-Scale African Languages
Wenxiang Jiao
|
Zhaopeng Tu
|
Jiarui Li
|
Wenxuan Wang
|
Jen-tse Huang
|
Shuming Shi
Proceedings of the Seventh Conference on Machine Translation (WMT)
This paper describes Tencent’s multilingual machine translation systems for the WMT22 shared task on Large-Scale Machine Translation Evaluation for African Languages. We participated in the constrained translation track in which only the data and pretrained models provided by the organizer are allowed.The task is challenging due to three problems, including the absence of training data for some to-be-evaluated language pairs, the uneven optimization of language pairs caused by data imbalance, and the curse of multilinguality. To address these problems, we adopt data augmentation, distributionally robust optimization, and language family grouping, respectively, to develop our multilingual neural machine translation (MNMT) models. Our submissions won the 1st place on the blind test sets in terms of the automatic evaluation metrics. Codes, models, and detailed competition results are available at https://github.com/wxjiao/WMT2022-Large-Scale-African.
Understanding and Improving Sequence-to-Sequence Pretraining for Neural Machine Translation
Wenxuan Wang
|
Wenxiang Jiao
|
Yongchang Hao
|
Xing Wang
|
Shuming Shi
|
Zhaopeng Tu
|
Michael Lyu
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
In this paper, we present a substantial step in better understanding the SOTA sequence-to-sequence (Seq2Seq) pretraining for neural machine translation (NMT). We focus on studying the impact of the jointly pretrained decoder, which is the main difference between Seq2Seq pretraining and previous encoder-based pretraining approaches for NMT. By carefully designing experiments on three language pairs, we find that Seq2Seq pretraining is a double-edged sword: On one hand, it helps NMT models to produce more diverse translations and reduce adequacy-related translation errors. On the other hand, the discrepancies between Seq2Seq pretraining and NMT finetuning limit the translation quality (i.e., domain discrepancy) and induce the over-estimation issue (i.e., objective discrepancy). Based on these observations, we further propose simple and effective strategies, named in-domain pretraining and input adaptation to remedy the domain and objective discrepancies, respectively. Experimental results on several language pairs show that our approach can consistently improve both translation performance and model robustness upon Seq2Seq pretraining.
2020
Rethinking the Value of Transformer Components
Wenxuan Wang
|
Zhaopeng Tu
Proceedings of the 28th International Conference on Computational Linguistics
Transformer becomes the state-of-the-art translation model, while it is not well studied how each intermediate component contributes to the model performance, which poses significant challenges for designing optimal architectures. In this work, we bridge this gap by evaluating the impact of individual component (sub-layer) in trained Transformer models from different perspectives. Experimental results across language pairs, training strategies, and model capacities show that certain components are consistently more important than the others. We also report a number of interesting findings that might help humans better analyze, understand and improve Transformer models. Based on these observations, we further propose a new training strategy that can improves translation performance by distinguishing the unimportant components in training.
Search
Co-authors
- Zhaopeng Tu 3
- Wenxiang Jiao 2
- Shuming Shi 2
- Jiarui Li 1
- Jen-tse Huang 1
- show all...