Ming Zhu


2023

pdf
Empowering a Metric with LLM-assisted Named Entity Annotation: HW-TSC’s Submission to the WMT23 Metrics Shared Task
Zhanglin Wu | Yilun Liu | Min Zhang | Xiaofeng Zhao | Junhao Zhu | Ming Zhu | Xiaosong Qiao | Jingfei Zhang | Ma Miaomiao | Zhao Yanqing | Song Peng | Shimin Tao | Hao Yang | Yanfei Jiang
Proceedings of the Eighth Conference on Machine Translation

This paper presents the submission of Huawei Translation Service Center (HW-TSC) to the WMT23 metrics shared task, in which we submit two metrics: KG-BERTScore and HWTSC-EE-Metric. Among them, KG-BERTScore is our primary submission for the reference-free metric, which can provide both segment-level and system-level scoring. While HWTSC-EE-Metric is our primary submission for the reference-based metric, which can only provide system-level scoring. Overall, our metrics show relatively high correlations with MQM scores on the metrics tasks of previous years. Especially on system-level scoring tasks, our metrics achieve new state-of-the-art in many language pairs.

pdf
HW-TSC 2023 Submission for the Quality Estimation Shared Task
Yuang Li | Chang Su | Ming Zhu | Mengyao Piao | Xinglin Lyu | Min Zhang | Hao Yang
Proceedings of the Eighth Conference on Machine Translation

Quality estimation (QE) is an essential technique to assess machine translation quality without reference translations. In this paper, we focus on Huawei Translation Services Center’s (HW-TSC’s) submission to the sentence-level QE shared task, named Ensemble-CrossQE. Our system uses CrossQE, the same model architecture as our last year’s submission, which consists of a multilingual base model and a task-specific downstream layer. The input is the concatenation of the source and the translated sentences. To enhance the performance, we finetuned and ensembled multiple base models such as XLM-R, InfoXLM, RemBERT and CometKiwi. Moreover, we introduce a new corruption-based data augmentation method, which generates deletion, substitution and insertion errors in the original translation and uses a reference-based QE model to obtain pseudo scores. Results show that our system achieves impressive performance on sentence-level QE test sets and ranked the first place for three language pairs: English-Hindi, English-Tamil and English-Telegu. In addition, we participated in the error span detection task. The submitted model outperforms the baseline on Chinese-English and Hebrew-English language pairs.

pdf
Improving Neural Machine Translation Formality Control with Domain Adaptation and Reranking-based Transductive Learning
Zhanglin Wu | Zongyao Li | Daimeng Wei | Hengchao Shang | Jiaxin Guo | Xiaoyu Chen | Zhiqiang Rao | Zhengzhe Yu | Jinlong Yang | Shaojun Li | Yuhao Xie | Bin Wei | Jiawei Zheng | Ming Zhu | Lizhi Lei | Hao Yang | Yanfei Jiang
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)

This paper presents Huawei Translation Service Center (HW-TSC)’s submission on the IWSLT 2023 formality control task, which provides two training scenarios: supervised and zero-shot, each containing two language pairs, and sets constrained and unconstrained conditions. We train the formality control models for these four language pairs under these two conditions respectively, and submit the corresponding translation results. Our efforts are divided into two fronts: enhancing general translation quality and improving formality control capability. According to the different requirements of the formality control task, we use a multi-stage pre-training method to train a bilingual or multilingual neural machine translation (NMT) model as the basic model, which can improve the general translation quality of the base model to a relatively high level. Then, under the premise of affecting the general translation quality of the basic model as little as possible, we adopt domain adaptation and reranking-based transductive learning methods to improve the formality control capability of the model.

pdf
Leveraging Multilingual Knowledge Graph to Boost Domain-specific Entity Translation of ChatGPT
Min Zhang | Limin Liu | Zhao Yanqing | Xiaosong Qiao | Su Chang | Xiaofeng Zhao | Junhao Zhu | Ming Zhu | Song Peng | Yinglu Li | Yilun Liu | Wenbing Ma | Mengyao Piao | Shimin Tao | Hao Yang | Yanfei Jiang
Proceedings of Machine Translation Summit XIX, Vol. 2: Users Track

Recently, ChatGPT has shown promising results for Machine Translation (MT) in general domains and is becoming a new paradigm for translation. In this paper, we focus on how to apply ChatGPT to domain-specific translation and propose to leverage Multilingual Knowledge Graph (MKG) to help ChatGPT improve the domain entity translation quality. To achieve this, we extract the bilingual entity pairs from MKG for the domain entities that are recognized from source sentences. We then introduce these pairs into translation prompts, instructing ChatGPT to use the correct translations of the domain entities. To evaluate the novel MKG method for ChatGPT, we conduct comparative experiments on three Chinese-English (zh-en) test datasets constructed from three specific domains, of which one domain is from biomedical science, and the other two are from the Information and Communications Technology (ICT) industry — Visible Light Communication (VLC) and wireless domains. Experimental results demonstrate that both the overall translation quality of ChatGPT (+6.21, +3.13 and +11.25 in BLEU scores) and the translation accuracy of domain entities (+43.2%, +30.2% and +37.9% absolute points) are significantly improved with MKG on the three test datasets.

pdf
KG-IQES: An Interpretable Quality Estimation System for Machine Translation Based on Knowledge Graph
Junhao Zhu | Min Zhang | Hao Yang | Song Peng | Zhanglin Wu | Yanfei Jiang | Xijun Qiu | Weiqiang Pan | Ming Zhu | Ma Miaomiao | Weidong Zhang
Proceedings of Machine Translation Summit XIX, Vol. 2: Users Track

The widespread use of machine translation (MT) has driven the need for effective automatic quality estimation (AQE) methods. How to enhance the interpretability of MT output quality estimation is well worth exploring in the industry. From the perspective of the alignment of named entities (NEs) in the source and translated sentences, we construct a multilingual knowledge graph (KG) consisting of domain-specific NEs, and design a KG-based interpretable quality estimation (QE) system for machine translations (KG-IQES). KG-IQES effectively estimates the translation quality without relying on reference translations. Its effectiveness has been verified in our business scenarios.

2022

pdf
HW-TSC Translation Systems for the WMT22 Biomedical Translation Task
Zhanglin Wu | Jinlong Yang | Zhiqiang Rao | Zhengzhe Yu | Daimeng Wei | Xiaoyu Chen | Zongyao Li | Hengchao Shang | Shaojun Li | Ming Zhu | Yuanchang Luo | Yuhao Xie | Miaomiao Ma | Ting Zhu | Lizhi Lei | Song Peng | Hao Yang | Ying Qin
Proceedings of the Seventh Conference on Machine Translation (WMT)

This paper describes the translation systems trained by Huawei translation services center (HW-TSC) for the WMT22 biomedical translation task in five language pairs: English↔German (en↔de), English↔French (en↔fr), English↔Chinese (en↔zh), English↔Russian (en↔ru) and Spanish→English (es→en). Our primary systems are built on deep Transformer with a large filter size. We also utilize R-Drop, data diversification, forward translation, back translation, data selection, finetuning and ensemble to improve the system performance. According to the official evaluation results in OCELoT or CodaLab, our unconstrained systems in en→de, de→en, en→fr, fr→en, en→zh and es→en (clinical terminology sub-track) get the highest BLEU scores among all submissions for the WMT22 biomedical translation task.

2020

pdf
Question Answering with Long Multiple-Span Answers
Ming Zhu | Aman Ahuja | Da-Cheng Juan | Wei Wei | Chandan K. Reddy
Findings of the Association for Computational Linguistics: EMNLP 2020

Answering questions in many real-world applications often requires complex and precise information excerpted from texts spanned across a long document. However, currently no such annotated dataset is publicly available, which hinders the development of neural question-answering (QA) systems. To this end, we present MASH-QA, a Multiple Answer Spans Healthcare Question Answering dataset from the consumer health domain, where answers may need to be excerpted from multiple, non-consecutive parts of text spanned across a long document. We also propose MultiCo, a neural architecture that is able to capture the relevance among multiple answer spans, by using a query-based contextualized sentence selection approach, for forming the answer to the given question. We also demonstrate that conventional QA models are not suitable for this type of task and perform poorly in this setting. Extensive experiments are conducted, and the experimental results confirm the proposed model significantly outperforms the state-of-the-art QA models in this multi-span QA setting.