Ting Zhu


2023

pdf
Multifaceted Challenge Set for Evaluating Machine Translation Performance
Xiaoyu Chen | Daimeng Wei | Zhanglin Wu | Ting Zhu | Hengchao Shang | Zongyao Li | Jiaxin Guo | Ning Xie | Lizhi Lei | Hao Yang | Yanfei Jiang
Proceedings of the Eighth Conference on Machine Translation

Machine Translation Evaluation is critical to Machine Translation research, as the evaluation results reflect the effectiveness of training strategies. As a result, a fair and efficient evaluation method is necessary. Many researchers have raised questions about currently available evaluation metrics from various perspectives, and propose suggestions accordingly. However, to our knowledge, few researchers has analyzed the difficulty level of source sentence and its influence on evaluation results. This paper presents HW-TSC’s submission to the WMT23 MT Test Suites shared task. We propose a systematic approach for construing challenge sets from four aspects: word difficulty, length difficulty, grammar difficulty and model learning difficulty. We open-source two Multifaceted Challenge Sets for Zh→En and En→Zh. We also present results of participants in this year’s General MT shared task on our test sets.

2022

pdf
HW-TSC’s Participation in the IWSLT 2022 Isometric Spoken Language Translation
Zongyao Li | Jiaxin Guo | Daimeng Wei | Hengchao Shang | Minghan Wang | Ting Zhu | Zhanglin Wu | Zhengzhe Yu | Xiaoyu Chen | Lizhi Lei | Hao Yang | Ying Qin
Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)

This paper presents our submissions to the IWSLT 2022 Isometric Spoken Language Translation task. We participate in all three language pairs (English-German, English-French, English-Spanish) under the constrained setting, and submit an English-German result under the unconstrained setting. We use the standard Transformer model as the baseline and obtain the best performance via one of its variants that shares the decoder input and output embedding. We perform detailed pre-processing and filtering on the provided bilingual data. Several strategies are used to train our models, such as Multilingual Translation, Back Translation, Forward Translation, R-Drop, Average Checkpoint, and Ensemble. We investigate three methods for biasing the output length: i) conditioning the output to a given target-source length-ratio class; ii) enriching the transformer positional embedding with length information and iii) length control decoding for non-autoregressive translation etc. Our submissions achieve 30.7, 41.6 and 36.7 BLEU respectively on the tst-COMMON test sets for English-German, English-French, English-Spanish tasks and 100% comply with the length requirements.

pdf
Exploring Robustness of Machine Translation Metrics: A Study of Twenty-Two Automatic Metrics in the WMT22 Metric Task
Xiaoyu Chen | Daimeng Wei | Hengchao Shang | Zongyao Li | Zhanglin Wu | Zhengzhe Yu | Ting Zhu | Mengli Zhu | Ning Xie | Lizhi Lei | Shimin Tao | Hao Yang | Ying Qin
Proceedings of the Seventh Conference on Machine Translation (WMT)

Contextual word embeddings extracted from pre-trained models have become the basis for many downstream NLP tasks, including machine translation automatic evaluations. Metrics that leverage embeddings claim better capture of synonyms and changes in word orders, and thus better correlation with human ratings than surface-form matching metrics (e.g. BLEU). However, few studies have been done to examine robustness of these metrics. This report uses a challenge set to uncover the brittleness of reference-based and reference-free metrics. Our challenge set1 aims at examining metrics’ capability to correlate synonyms in different areas and to discern catastrophic errors at both word- and sentence-levels. The results show that although embedding-based metrics perform relatively well on discerning sentence-level negation/affirmation errors, their performances on relating synonyms are poor. In addition, we find that some metrics are susceptible to text styles so their generalizability compromised.

pdf
HW-TSC’s Submission for the WMT22 Efficiency Task
Hengchao Shang | Ting Hu | Daimeng Wei | Zongyao Li | Xianzhi Yu | Jianfei Feng | Ting Zhu | Lizhi Lei | Shimin Tao | Hao Yang | Ying Qin | Jinlong Yang | Zhiqiang Rao | Zhengzhe Yu
Proceedings of the Seventh Conference on Machine Translation (WMT)

This paper presents the submission of Huawei Translation Services Center (HW-TSC) to WMT 2022 Efficiency Shared Task. For this year’s task, we still apply sentence-level distillation strategy to train small models with different configurations. Then, we integrate the average attention mechanism into the lightweight RNN model to pursue more efficient decoding. We tried adding a retrain step to our 8-bit and 4-bit models to achieve a balance between model size and quality. We still use Huawei Noah’s Bolt for INT8 inference and 4-bit storage. Coupled with Bolt’s support for batch inference and multi-core parallel computing, we finally submit models with different configurations to the CPU latency and throughput tracks to explore the Pareto frontiers.

pdf
HW-TSC Translation Systems for the WMT22 Biomedical Translation Task
Zhanglin Wu | Jinlong Yang | Zhiqiang Rao | Zhengzhe Yu | Daimeng Wei | Xiaoyu Chen | Zongyao Li | Hengchao Shang | Shaojun Li | Ming Zhu | Yuanchang Luo | Yuhao Xie | Miaomiao Ma | Ting Zhu | Lizhi Lei | Song Peng | Hao Yang | Ying Qin
Proceedings of the Seventh Conference on Machine Translation (WMT)

This paper describes the translation systems trained by Huawei translation services center (HW-TSC) for the WMT22 biomedical translation task in five language pairs: English↔German (en↔de), English↔French (en↔fr), English↔Chinese (en↔zh), English↔Russian (en↔ru) and Spanish→English (es→en). Our primary systems are built on deep Transformer with a large filter size. We also utilize R-Drop, data diversification, forward translation, back translation, data selection, finetuning and ensemble to improve the system performance. According to the official evaluation results in OCELoT or CodaLab, our unconstrained systems in en→de, de→en, en→fr, fr→en, en→zh and es→en (clinical terminology sub-track) get the highest BLEU scores among all submissions for the WMT22 biomedical translation task.

pdf
HW-TSC Translation Systems for the WMT22 Chat Translation Task
Jinlong Yang | Zongyao Li | Daimeng Wei | Hengchao Shang | Xiaoyu Chen | Zhengzhe Yu | Zhiqiang Rao | Shaojun Li | Zhanglin Wu | Yuhao Xie | Yuanchang Luo | Ting Zhu | Yanqing Zhao | Lizhi Lei | Hao Yang | Ying Qin
Proceedings of the Seventh Conference on Machine Translation (WMT)

This paper describes the submissions of Huawei Translation Services Center (HW-TSC) to WMT22 chat translation shared task on English-Germany (en-de) bidirection with results of zore-shot and few-shot tracks. We use the deep transformer architecture with a lager parameter size. Our submissions to the WMT21 News Translation task are used as the baselines. We adopt strategies such as back translation, forward translation, domain transfer, data selection, and noisy forward translation in task, and achieve competitive results on the development set. We also test the effectiveness of document translation on chat tasks. Due to the lack of chat data, the results on the development set show that it is not as effective as sentence-level translation models.