Dongdong Zhang


2021

pdf bib
How Does Distilled Data Complexity Impact the Quality and Confidence of Non-Autoregressive Machine Translation?
Weijia Xu | Shuming Ma | Dongdong Zhang | Marine Carpuat
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Improving Multilingual Neural Machine Translation with Auxiliary Source Languages
Weijia Xu | Yuwei Yin | Shuming Ma | Dongdong Zhang | Haoyang Huang
Findings of the Association for Computational Linguistics: EMNLP 2021

Multilingual neural machine translation models typically handle one source language at a time. However, prior work has shown that translating from multiple source languages improves translation quality. Different from existing approaches on multi-source translation that are limited to the test scenario where parallel source sentences from multiple languages are available at inference time, we propose to improve multilingual translation in a more common scenario by exploiting synthetic source sentences from auxiliary languages. We train our model on synthetic multi-source corpora and apply random masking to enable flexible inference with single-source or bi-source inputs. Extensive experiments on Chinese/English-Japanese and a large-scale multilingual translation benchmark show that our model outperforms the multilingual baseline significantly by up to +4.0 BLEU with the largest improvements on low-resource or distant language pairs.

pdf bib
Smart-Start Decoding for Neural Machine Translation
Jian Yang | Shuming Ma | Dongdong Zhang | Juncheng Wan | Zhoujun Li | Ming Zhou
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Most current neural machine translation models adopt a monotonic decoding order of either left-to-right or right-to-left. In this work, we propose a novel method that breaks up the limitation of these decoding orders, called Smart-Start decoding. More specifically, our method first predicts a median word. It starts to decode the words on the right side of the median word and then generates words on the left. We evaluate the proposed Smart-Start decoding method on three datasets. Experimental results show that the proposed method can significantly outperform strong baseline models.

pdf bib
Multilingual Agreement for Multilingual Neural Machine Translation
Jian Yang | Yuwei Yin | Shuming Ma | Haoyang Huang | Dongdong Zhang | Zhoujun Li | Furu Wei
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Although multilingual neural machine translation (MNMT) enables multiple language translations, the training process is based on independent multilingual objectives. Most multilingual models can not explicitly exploit different language pairs to assist each other, ignoring the relationships among them. In this work, we propose a novel agreement-based method to encourage multilingual agreement among different translation directions, which minimizes the differences among them. We combine the multilingual training objectives with the agreement term by randomly substituting some fragments of the source language with their counterpart translations of auxiliary languages. To examine the effectiveness of our method, we conduct experiments on the multilingual translation task of 10 language pairs. Experimental results show that our method achieves significant improvements over the previous multilingual baselines.

pdf bib
Zero-Shot Cross-Lingual Transfer of Neural Machine Translation with Multilingual Pretrained Encoders
Guanhua Chen | Shuming Ma | Yun Chen | Li Dong | Dongdong Zhang | Jia Pan | Wenping Wang | Furu Wei
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Previous work mainly focuses on improving cross-lingual transfer for NLU tasks with a multilingual pretrained encoder (MPE), or improving the performance on supervised machine translation with BERT. However, it is under-explored that whether the MPE can help to facilitate the cross-lingual transferability of NMT model. In this paper, we focus on a zero-shot cross-lingual transfer task in NMT. In this task, the NMT model is trained with parallel dataset of only one language pair and an off-the-shelf MPE, then it is directly tested on zero-shot language pairs. We propose SixT, a simple yet effective model for this task. SixT leverages the MPE with a two-stage training schedule and gets further improvement with a position disentangled encoder and a capacity-enhanced decoder. Using this method, SixT significantly outperforms mBART, a pretrained multilingual encoder-decoder model explicitly designed for NMT, with an average improvement of 7.1 BLEU on zero-shot any-to-English test sets across 14 source languages. Furthermore, with much less training computation cost and training data, our model achieves better performance on 15 any-to-English test sets than CRISS and m2m-100, two strong multilingual NMT baselines.

pdf bib
Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task
Jian Yang | Shuming Ma | Haoyang Huang | Dongdong Zhang | Li Dong | Shaohan Huang | Alexandre Muzio | Saksham Singhal | Hany Hassan | Xia Song | Furu Wei
Proceedings of the Sixth Conference on Machine Translation

This report describes Microsoft’s machine translation systems for the WMT21 shared task on large-scale multilingual machine translation. We participated in all three evaluation tracks including Large Track and two Small Tracks where the former one is unconstrained and the latter two are fully constrained. Our model submissions to the shared task were initialized with DeltaLM, a generic pre-trained multilingual encoder-decoder model, and fine-tuned correspondingly with the vast collected parallel data and allowed data sources according to track settings, together with applying progressive learning and iterative back-translation approaches to further improve the performance. Our final submissions ranked first on three tracks in terms of the automatic evaluation metric.

2020

pdf bib
Document Classification for COVID-19 Literature
Bernal Jiménez Gutiérrez | Juncheng Zeng | Dongdong Zhang | Ping Zhang | Yu Su
Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020

The global pandemic has made it more important than ever to quickly and accurately retrieve relevant scientific literature for effective consumption by researchers in a wide range of fields. We provide an analysis of several multi-label document classification models on the LitCovid dataset. We find that pre-trained language models outperform other models in both low and high data regimes, achieving a maximum F1 score of around 86%. We note that even the highest performing models still struggle with label correlation, distraction from introductory text and CORD-19 generalization. Both data and code are available on GitHub.

pdf bib
Document Classification for COVID-19 Literature
Bernal Jimenez Gutierrez | Jucheng Zeng | Dongdong Zhang | Ping Zhang | Yu Su
Findings of the Association for Computational Linguistics: EMNLP 2020

The global pandemic has made it more important than ever to quickly and accurately retrieve relevant scientific literature for effective consumption by researchers in a wide range of fields. We provide an analysis of several multi-label document classification models on the LitCovid dataset, a growing collection of 23,000 research papers regarding the novel 2019 coronavirus. We find that pre-trained language models fine-tuned on this dataset outperform all other baselines and that BioBERT surpasses the others by a small margin with micro-F1 and accuracy scores of around 86% and 75% respectively on the test set. We evaluate the data efficiency and generalizability of these models as essential features of any system prepared to deal with an urgent situation like the current health crisis. We perform a data ablation study to determine how important article titles are for achieving reasonable performance on this dataset. Finally, we explore 50 errors made by the best performing models on LitCovid documents and find that they often (1) correlate certain labels too closely together and (2) fail to focus on discriminative sections of the articles; both of which are important issues to address in future work. Both data and code are available on GitHub.

pdf bib
A Simple and Effective Unified Encoder for Document-Level Machine Translation
Shuming Ma | Dongdong Zhang | Ming Zhou
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Most of the existing models for document-level machine translation adopt dual-encoder structures. The representation of the source sentences and the document-level contexts are modeled with two separate encoders. Although these models can make use of the document-level contexts, they do not fully model the interaction between the contexts and the source sentences, and can not directly adapt to the recent pre-training models (e.g., BERT) which encodes multiple sentences with a single encoder. In this work, we propose a simple and effective unified encoder that can outperform the baseline models of dual-encoder models in terms of BLEU and METEOR scores. Moreover, the pre-training models can further boost the performance of our proposed model.

pdf bib
Improving Neural Machine Translation with Soft Template Prediction
Jian Yang | Shuming Ma | Dongdong Zhang | Zhoujun Li | Ming Zhou
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Although neural machine translation (NMT) has achieved significant progress in recent years, most previous NMT models only depend on the source text to generate translation. Inspired by the success of template-based and syntax-based approaches in other fields, we propose to use extracted templates from tree structures as soft target templates to guide the translation procedure. In order to learn the syntactic structure of the target sentences, we adopt constituency-based parse tree to generate candidate templates. We incorporate the template information into the encoder-decoder framework to jointly utilize the templates and source text. Experiments show that our model significantly outperforms the baseline models on four benchmarks and demonstrates the effectiveness of soft target templates.

2017

pdf bib
Sequence-to-Dependency Neural Machine Translation
Shuangzhi Wu | Dongdong Zhang | Nan Yang | Mu Li | Ming Zhou
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Nowadays a typical Neural Machine Translation (NMT) model generates translations from left to right as a linear sequence, during which latent syntactic structures of the target sentences are not explicitly concerned. Inspired by the success of using syntactic knowledge of target language for improving statistical machine translation, in this paper we propose a novel Sequence-to-Dependency Neural Machine Translation (SD-NMT) method, in which the target word sequence and its corresponding dependency structure are jointly constructed and modeled, and this structure is used as context to facilitate word generations. Experimental results show that the proposed method significantly outperforms state-of-the-art baselines on Chinese-English and Japanese-English translation tasks.

2015

pdf bib
Efficient Disfluency Detection with Transition-based Parsing
Shuangzhi Wu | Dongdong Zhang | Ming Zhou | Tiejun Zhao
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2014

pdf bib
Learning Topic Representation for SMT with Neural Networks
Lei Cui | Dongdong Zhang | Shujie Liu | Qiming Chen | Mu Li | Ming Zhou | Muyun Yang
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
A Lexicalized Reordering Model for Hierarchical Phrase-based Translation
Hailong Cao | Dongdong Zhang | Mu Li | Ming Zhou | Tiejun Zhao
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Soft Dependency Matching for Hierarchical Phrase-based Machine Translation
Hailong Cao | Dongdong Zhang | Ming Zhou | Tiejun Zhao
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

2013

pdf bib
Multi-Domain Adaptation for SMT Using Multi-Task Learning
Lei Cui | Xilun Chen | Dongdong Zhang | Shujie Liu | Mu Li | Ming Zhou
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Punctuation Prediction with Transition-based Parsing
Dongdong Zhang | Shuangzhi Wu | Nan Yang | Mu Li
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Bilingual Data Cleaning for SMT using Graph-based Random Walk
Lei Cui | Dongdong Zhang | Shujie Liu | Mu Li | Ming Zhou
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf bib
A Ranking-based Approach to Word Reordering for Statistical Machine Translation
Nan Yang | Mu Li | Dongdong Zhang | Nenghai Yu
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Hierarchical Chunk-to-String Translation
Yang Feng | Dongdong Zhang | Mu Li | Qun Liu
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Translation Model Size Reduction for Hierarchical Phrase-based Statistical Machine Translation
Seung-Wook Lee | Dongdong Zhang | Mu Li | Ming Zhou | Hae-Chang Rim
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2011

pdf bib
Function Word Generation in Statistical Machine Translation Systems
Lei Cui | Dongdong Zhang | Mu Li | Ming Zhou
Proceedings of Machine Translation Summit XIII: Papers

2010

pdf bib
Mixture Model-based Minimum Bayes Risk Decoding using Multiple Machine Translation Systems
Nan Duan | Mu Li | Dongdong Zhang | Ming Zhou
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
Adaptive Development Data Selection for Log-linear Model in Statistical Machine Translation
Mu Li | Yinggong Zhao | Dongdong Zhang | Ming Zhou
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
Hybrid Decoding: Decoding with Partial Hypotheses Combination over Multiple SMT Systems
Lei Cui | Dongdong Zhang | Mu Li | Ming Zhou | Tiejun Zhao
Coling 2010: Posters

pdf bib
A Joint Rule Selection Model for Hierarchical Phrase-Based Translation
Lei Cui | Dongdong Zhang | Mu Li | Ming Zhou | Tiejun Zhao
Proceedings of the ACL 2010 Conference Short Papers

2009

pdf bib
Collaborative Decoding: Partial Hypothesis Re-ranking Using Translation Consensus between Decoders
Mu Li | Nan Duan | Dongdong Zhang | Chi-Ho Li | Ming Zhou
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
Better Synchronous Binarization for Machine Translation
Tong Xiao | Mu Li | Dongdong Zhang | Jingbo Zhu | Ming Zhou
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Introduction to China’s CWMT2008 Machine Translation Evaluation
Hongmei Zhao | Jun Xie | Qun Liu | Yajuan Lü | Dongdong Zhang | Mu Li
Proceedings of Machine Translation Summit XII: Papers

2008

pdf bib
An Empirical Study in Source Word Deletion for Phrase-Based Statistical Machine Translation
Chi-Ho Li | Hailei Zhang | Dongdong Zhang | Mu Li | Ming Zhou
Proceedings of the Third Workshop on Statistical Machine Translation

pdf bib
Diagnostic Evaluation of Machine Translation Systems Using Automatically Constructed Linguistic Check-Points
Ming Zhou | Bo Wang | Shujie Liu | Mu Li | Dongdong Zhang | Tiejun Zhao
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf bib
Measure Word Generation for English-Chinese SMT Systems
Dongdong Zhang | Mu Li | Nan Duan | Chi-Ho Li | Ming Zhou
Proceedings of ACL-08: HLT

2007

pdf bib
A Probabilistic Approach to Syntax-based Reordering for Statistical Machine Translation
Chi-Ho Li | Minghui Li | Dongdong Zhang | Mu Li | Ming Zhou | Yi Guan
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

pdf bib
Phrase Reordering Model Integrating Syntactic Knowledge for SMT
Dongdong Zhang | Mu Li | Chi-Ho Li | Ming Zhou
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)