2021
pdf
bib
abs
HW-TSC’s Participation in the WMT 2021 News Translation Shared Task
Daimeng Wei
|
Zongyao Li
|
Zhanglin Wu
|
Zhengzhe Yu
|
Xiaoyu Chen
|
Hengchao Shang
|
Jiaxin Guo
|
Minghan Wang
|
Lizhi Lei
|
Min Zhang
|
Hao Yang
|
Ying Qin
Proceedings of the Sixth Conference on Machine Translation
This paper presents the submission of Huawei Translate Services Center (HW-TSC) to the WMT 2021 News Translation Shared Task. We participate in 7 language pairs, including Zh/En, De/En, Ja/En, Ha/En, Is/En, Hi/Bn, and Xh/Zu in both directions under the constrained condition. We use Transformer architecture and obtain the best performance via multiple variants with larger parameter sizes. We perform detailed pre-processing and filtering on the provided large-scale bilingual and monolingual datasets. Several commonly used strategies are used to train our models, such as Back Translation, Forward Translation, Multilingual Translation, Ensemble Knowledge Distillation, etc. Our submission obtains competitive results in the final evaluation.
pdf
bib
abs
HW-TSC’s Participation in the WMT 2021 Triangular MT Shared Task
Zongyao Li
|
Daimeng Wei
|
Hengchao Shang
|
Xiaoyu Chen
|
Zhanglin Wu
|
Zhengzhe Yu
|
Jiaxin Guo
|
Minghan Wang
|
Lizhi Lei
|
Min Zhang
|
Hao Yang
|
Ying Qin
Proceedings of the Sixth Conference on Machine Translation
This paper presents the submission of Huawei Translation Service Center (HW-TSC) to WMT 2021 Triangular MT Shared Task. We participate in the Russian-to-Chinese task under the constrained condition. We use Transformer architecture and obtain the best performance via a variant with larger parameter sizes. We perform detailed data pre-processing and filtering on the provided large-scale bilingual data. Several strategies are used to train our models, such as Multilingual Translation, Back Translation, Forward Translation, Data Denoising, Average Checkpoint, Ensemble, Fine-tuning, etc. Our system obtains 32.5 BLEU on the dev set and 27.7 BLEU on the test set, the highest score among all submissions.
pdf
bib
abs
HW-TSC’s Participation in the WMT 2021 Large-Scale Multilingual Translation Task
Zhengzhe Yu
|
Daimeng Wei
|
Zongyao Li
|
Hengchao Shang
|
Xiaoyu Chen
|
Zhanglin Wu
|
Jiaxin Guo
|
Minghan Wang
|
Lizhi Lei
|
Min Zhang
|
Hao Yang
|
Ying Qin
Proceedings of the Sixth Conference on Machine Translation
This paper presents the submission of Huawei Translation Services Center (HW-TSC) to the WMT 2021 Large-Scale Multilingual Translation Task. We participate in Samll Track #2, including 6 languages: Javanese (Jv), Indonesian (Id), Malay (Ms), Tagalog (Tl), Tamil (Ta) and English (En) with 30 directions under the constrained condition. We use Transformer architecture and obtain the best performance via multiple variants with larger parameter sizes. We train a single multilingual model to translate all the 30 directions. We perform detailed pre-processing and filtering on the provided large-scale bilingual and monolingual datasets. Several commonly used strategies are used to train our models, such as Back Translation, Forward Translation, Ensemble Knowledge Distillation, Adapter Fine-tuning. Our model obtains competitive results in the end.
pdf
bib
abs
HW-TSC’s Participation in the WMT 2021 Efficiency Shared Task
Hengchao Shang
|
Ting Hu
|
Daimeng Wei
|
Zongyao Li
|
Jianfei Feng
|
ZhengZhe Yu
|
Jiaxin Guo
|
Shaojun Li
|
Lizhi Lei
|
ShiMin Tao
|
Hao Yang
|
Jun Yao
|
Ying Qin
Proceedings of the Sixth Conference on Machine Translation
This paper presents the submission of Huawei Translation Services Center (HW-TSC) to WMT 2021 Efficiency Shared Task. We explore the sentence-level teacher-student distillation technique and train several small-size models that find a balance between efficiency and quality. Our models feature deep encoder, shallow decoder and light-weight RNN with SSRU layer. We use Huawei Noah’s Bolt, an efficient and light-weight library for on-device inference. Leveraging INT8 quantization, self-defined General Matrix Multiplication (GEMM) operator, shortlist, greedy search and caching, we submit four small-size and efficient translation models with high translation quality for the one CPU core latency track.
pdf
bib
abs
HW-TSC’s Submissions to the WMT21 Biomedical Translation Task
Hao Yang
|
Zhanglin Wu
|
Zhengzhe Yu
|
Xiaoyu Chen
|
Daimeng Wei
|
Zongyao Li
|
Hengchao Shang
|
Minghan Wang
|
Jiaxin Guo
|
Lizhi Lei
|
Chuanfei Xu
|
Min Zhang
|
Ying Qin
Proceedings of the Sixth Conference on Machine Translation
This paper describes the submission of Huawei Translation Service Center (HW-TSC) to WMT21 biomedical translation task in two language pairs: Chinese↔English and German↔English (Our registered team name is HuaweiTSC). Technical details are introduced in this paper, including model framework, data pre-processing method and model enhancement strategies. In addition, using the wmt20 OK-aligned biomedical test set, we compare and analyze system performances under different strategies. On WMT21 biomedical translation task, Our systems in English→Chinese and English→German directions get the highest BLEU scores among all submissions according to the official evaluation results.
pdf
bib
abs
HI-CMLM: Improve CMLM with Hybrid Decoder Input
Minghan Wang
|
Guo Jiaxin
|
Yuxia Wang
|
Yimeng Chen
|
Su Chang
|
Daimeng Wei
|
Min Zhang
|
Shimin Tao
|
Hao Yang
Proceedings of the 14th International Conference on Natural Language Generation
Mask-predict CMLM (Ghazvininejad et al.,2019) has achieved stunning performance among non-autoregressive NMT models, but we find that the mechanism of predicting all of the target words only depending on the hidden state of [MASK] is not effective and efficient in initial iterations of refinement, resulting in ungrammatical repetitions and slow convergence. In this work, we mitigate this problem by combining copied source with embeddings of [MASK] in decoder. Notably. it’s not a straightforward copying that is shown to be useless, but a novel heuristic hybrid strategy — fence-mask. Experimental results show that it gains consistent boosts on both WMT14 En<->De and WMT16 En<->Ro corpus by 0.5 BLEU on average, and 1 BLEU for less-informative short sentences. This reveals that incorporating additional information by proper strategies is beneficial to improve CMLM, particularly translation quality of short texts and speeding up early-stage convergence.
2020
pdf
bib
abs
HW-TSC’s Participation in the WMT 2020 News Translation Shared Task
Daimeng Wei
|
Hengchao Shang
|
Zhanglin Wu
|
Zhengzhe Yu
|
Liangyou Li
|
Jiaxin Guo
|
Minghan Wang
|
Hao Yang
|
Lizhi Lei
|
Ying Qin
|
Shiliang Sun
Proceedings of the Fifth Conference on Machine Translation
This paper presents our work in the WMT 2020 News Translation Shared Task. We participate in 3 language pairs including Zh/En, Km/En, and Ps/En and in both directions under the constrained condition. We use the standard Transformer-Big model as the baseline and obtain the best performance via two variants with larger parameter sizes. We perform detailed pre-processing and filtering on the provided large-scale bilingual and monolingual dataset. Several commonly used strategies are used to train our models such as Back Translation, Ensemble Knowledge Distillation, etc. We also conduct experiment with similar language augmentation, which lead to positive results, although not used in our submission. Our submission obtains remarkable results in the final evaluation.
pdf
bib
abs
HW-TSC’s Participation at WMT 2020 Automatic Post Editing Shared Task
Hao Yang
|
Minghan Wang
|
Daimeng Wei
|
Hengchao Shang
|
Jiaxin Guo
|
Zongyao Li
|
Lizhi Lei
|
Ying Qin
|
Shimin Tao
|
Shiliang Sun
|
Yimeng Chen
Proceedings of the Fifth Conference on Machine Translation
The paper presents the submission by HW-TSC in the WMT 2020 Automatic Post Editing Shared Task. We participate in the English-German and English-Chinese language pairs. Our system is built based on the Transformer pre-trained on WMT 2019 and WMT 2020 News Translation corpora, and fine-tuned on the APE corpus. Bottleneck Adapter Layers are integrated into the model to prevent over-fitting. We further collect external translations as the augmented MT candidates to improve the performance. The experiment demonstrates that pre-trained NMT models are effective when fine-tuning with the APE corpus of a limited size, and the performance can be further improved with external MT augmentation. Our system achieves competitive results on both directions in the final evaluation.
pdf
bib
abs
HW-TSC’s Participation at WMT 2020 Quality Estimation Shared Task
Minghan Wang
|
Hao Yang
|
Hengchao Shang
|
Daimeng Wei
|
Jiaxin Guo
|
Lizhi Lei
|
Ying Qin
|
Shimin Tao
|
Shiliang Sun
|
Yimeng Chen
|
Liangyou Li
Proceedings of the Fifth Conference on Machine Translation
This paper presents our work in the WMT 2020 Word and Sentence-Level Post-Editing Quality Estimation (QE) Shared Task. Our system follows standard Predictor-Estimator architecture, with a pre-trained Transformer as the Predictor, and specific classifiers and regressors as Estimators. We integrate Bottleneck Adapter Layers in the Predictor to improve the transfer learning efficiency and prevent from over-fitting. At the same time, we jointly train the word- and sentence-level tasks with a unified model with multitask learning. Pseudo-PE assisted QE (PEAQE) is proposed, resulting in significant improvements on the performance. Our submissions achieve competitive result in word/sentence-level sub-tasks for both of En-De/Zh language pairs.
pdf
bib
abs
HW-TSC’s Participation in the WAT 2020 Indic Languages Multilingual Task
Zhengzhe Yu
|
Zhanglin Wu
|
Xiaoyu Chen
|
Daimeng Wei
|
Hengchao Shang
|
Jiaxin Guo
|
Zongyao Li
|
Minghan Wang
|
Liangyou Li
|
Lizhi Lei
|
Hao Yang
|
Ying Qin
Proceedings of the 7th Workshop on Asian Translation
This paper describes our work in the WAT 2020 Indic Multilingual Translation Task. We participated in all 7 language pairs (En<->Bn/Hi/Gu/Ml/Mr/Ta/Te) in both directions under the constrained condition—using only the officially provided data. Using transformer as a baseline, our Multi->En and En->Multi translation systems achieve the best performances. Detailed data filtering and data domain selection are the keys to performance enhancement in our experiment, with an average improvement of 2.6 BLEU scores for each language pair in the En->Multi system and an average improvement of 4.6 BLEU scores regarding the Multi->En. In addition, we employed language independent adapter to further improve the system performances. Our submission obtains competitive results in the final evaluation.
pdf
bib
abs
The HW-TSC Video Speech Translation System at IWSLT 2020
Minghan Wang
|
Hao Yang
|
Yao Deng
|
Ying Qin
|
Lizhi Lei
|
Daimeng Wei
|
Hengchao Shang
|
Ning Xie
|
Xiaochun Li
|
Jiaxian Guo
Proceedings of the 17th International Conference on Spoken Language Translation
The paper presents details of our system in the IWSLT Video Speech Translation evaluation. The system works in a cascade form, which contains three modules: 1) A proprietary ASR system. 2) A disfluency correction system aims to remove interregnums or other disfluent expressions with a fine-tuned BERT and a series of rule-based algorithms. 3) An NMT System based on the Transformer and trained with massive publicly available corpus.