Jingyi Zhang


2021

pdf
A Bidirectional Transformer Based Alignment Model for Unsupervised Word Alignment
Jingyi Zhang | Josef van Genabith
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Word alignment and machine translation are two closely related tasks. Neural translation models, such as RNN-based and Transformer models, employ a target-to-source attention mechanism which can provide rough word alignments, but with a rather low accuracy. High-quality word alignment can help neural machine translation in many different ways, such as missing word detection, annotation transfer and lexicon injection. Existing methods for learning word alignment include statistical word aligners (e.g. GIZA++) and recently neural word alignment models. This paper presents a bidirectional Transformer based alignment (BTBA) model for unsupervised learning of the word alignment task. Our BTBA model predicts the current target word by attending the source context and both left-side and right-side target context to produce accurate target-to-source attention (alignment). We further fine-tune the target-to-source attention in the BTBA model to obtain better alignments using a full context based optimization method and self-supervised training. We test our method on three word alignment tasks and show that our method outperforms both previous neural word alignment approaches and the popular statistical word aligner GIZA++.

2020

pdf
Learning Source Phrase Representations for Neural Machine Translation
Hongfei Xu | Josef van Genabith | Deyi Xiong | Qiuhui Liu | Jingyi Zhang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

The Transformer translation model (Vaswani et al., 2017) based on a multi-head attention mechanism can be computed effectively in parallel and has significantly pushed forward the performance of Neural Machine Translation (NMT). Though intuitively the attentional network can connect distant words via shorter network paths than RNNs, empirical analysis demonstrates that it still has difficulty in fully capturing long-distance dependencies (Tang et al., 2018). Considering that modeling phrases instead of words has significantly improved the Statistical Machine Translation (SMT) approach through the use of larger translation blocks (“phrases”) and its reordering ability, modeling NMT at phrase level is an intuitive proposal to help the model capture long-distance relationships. In this paper, we first propose an attentive phrase representation generation mechanism which is able to generate phrase representations from corresponding token representations. In addition, we incorporate the generated phrase representations into the Transformer translation model to enhance its ability to capture long-distance relationships. In our experiments, we obtain significant improvements on the WMT 14 English-German and English-French tasks on top of the strong Transformer baseline, which shows the effectiveness of our approach. Our approach helps Transformer Base models perform at the level of Transformer Big models, and even significantly better for long sentences, but with substantially fewer parameters and training steps. The fact that phrase representations help even in the big setting further supports our conjecture that they make a valuable contribution to long-distance relations.

pdf
Lipschitz Constrained Parameter Initialization for Deep Transformers
Hongfei Xu | Qiuhui Liu | Josef van Genabith | Deyi Xiong | Jingyi Zhang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

The Transformer translation model employs residual connection and layer normalization to ease the optimization difficulties caused by its multi-layer encoder/decoder structure. Previous research shows that even with residual connection and layer normalization, deep Transformers still have difficulty in training, and particularly Transformer models with more than 12 encoder/decoder layers fail to converge. In this paper, we first empirically demonstrate that a simple modification made in the official implementation, which changes the computation order of residual connection and layer normalization, can significantly ease the optimization of deep Transformers. We then compare the subtle differences in computation order in considerable detail, and present a parameter initialization method that leverages the Lipschitz constraint on the initialization of Transformer parameters that effectively ensures training convergence. In contrast to findings in previous research we further demonstrate that with Lipschitz parameter initialization, deep Transformers with the original computation order can converge, and obtain significant BLEU improvements with up to 24 layers. In contrast to previous research which focuses on deep encoders, our approach additionally enables Transformers to also benefit from deep decoders.

pdf
Translation Quality Estimation by Jointly Learning to Score and Rank
Jingyi Zhang | Josef van Genabith
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

The translation quality estimation (QE) task, particularly the QE as a Metric task, aims to evaluate the general quality of a translation based on the translation and the source sentence without using reference translations. Supervised learning of this QE task requires human evaluation of translation quality as training data. Human evaluation of translation quality can be performed in different ways, including assigning an absolute score to a translation or ranking different translations. In order to make use of different types of human evaluation data for supervised learning, we present a multi-task learning QE model that jointly learns two tasks: score a translation and rank two translations. Our QE model exploits cross-lingual sentence embeddings from pre-trained multilingual language models. We obtain new state-of-the-art results on the WMT 2019 QE as a Metric task and outperform sentBLEU on the WMT 2019 Metrics task.

2019

pdf
DFKI-NMT Submission to the WMT19 News Translation Task
Jingyi Zhang | Josef van Genabith
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

This paper describes the DFKI-NMT submission to the WMT19 News translation task. We participated in both English-to-German and German-to-English directions. We trained Transformer models and adopted various techniques for effectively training our models, including data selection, back-translation and in-domain fine-tuning. We give a detailed analysis of the performance of our system.

2018

pdf
UniMa at SemEval-2018 Task 7: Semantic Relation Extraction and Classification from Scientific Publications
Thorsten Keiper | Zhonghao Lyu | Sara Pooladzadeh | Yuan Xu | Jingyi Zhang | Anne Lauscher | Simone Paolo Ponzetto
Proceedings of the 12th International Workshop on Semantic Evaluation

Large repositories of scientific literature call for the development of robust methods to extract information from scholarly papers. This problem is addressed by the SemEval 2018 Task 7 on extracting and classifying relations found within scientific publications. In this paper, we present a feature-based and a deep learning-based approach to the task and discuss the results of the system runs that we submitted for evaluation.

pdf
Guiding Neural Machine Translation with Retrieved Translation Pieces
Jingyi Zhang | Masao Utiyama | Eiichro Sumita | Graham Neubig | Satoshi Nakamura
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

One of the difficulties of neural machine translation (NMT) is the recall and appropriate translation of low-frequency words or phrases. In this paper, we propose a simple, fast, and effective method for recalling previously seen translation examples and incorporating them into the NMT decoding process. Specifically, for an input sentence, we use a search engine to retrieve sentence pairs whose source sides are similar with the input sentence, and then collect n-grams that are both in the retrieved target sentences and aligned with words that match in the source sentences, which we call “translation pieces”. We compute pseudo-probabilities for each retrieved sentence based on similarities between the input sentence and the retrieved source sentences, and use these to weight the retrieved translation pieces. Finally, an existing NMT model is used to translate the input sentence, with an additional bonus given to outputs that contain the collected translation pieces. We show our method improves NMT translation results up to 6 BLEU points on three narrow domain translation tasks where repetitiveness of the target sentences is particularly salient. It also causes little increase in the translation time, and compares favorably to another alternative retrieval-based method with respect to accuracy, speed, and simplicity of implementation.

2017

pdf
Improving Neural Machine Translation through Phrase-based Forced Decoding
Jingyi Zhang | Masao Utiyama | Eiichro Sumita | Graham Neubig | Satoshi Nakamura
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Compared to traditional statistical machine translation (SMT), neural machine translation (NMT) often sacrifices adequacy for the sake of fluency. We propose a method to combine the advantages of traditional SMT and NMT by exploiting an existing phrase-based SMT model to compute the phrase-based decoding cost for an NMT output and then using the phrase-based decoding cost to rerank the n-best NMT outputs. The main challenge in implementing this approach is that NMT outputs may not be in the search space of the standard phrase-based decoding algorithm, because the search space of phrase-based SMT is limited by the phrase-based translation rule table. We propose a soft forced decoding algorithm, which can always successfully find a decoding path for any NMT output. We show that using the forced decoding cost to rerank the NMT outputs can successfully improve translation quality on four different language pairs.

pdf
NICT-NAIST System for WMT17 Multimodal Translation Task
Jingyi Zhang | Masao Utiyama | Eiichro Sumita | Graham Neubig | Satoshi Nakamura
Proceedings of the Second Conference on Machine Translation

2016

pdf
A Continuous Space Rule Selection Model for Syntax-based Statistical Machine Translation
Jingyi Zhang | Masao Utiyama | Eiichro Sumita | Graham Neubig | Satoshi Nakamura
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2015

pdf
A Binarized Neural Network Joint Model for Machine Translation
Jingyi Zhang | Masao Utiyama | Eiichiro Sumita | Graham Neubig | Satoshi Nakamura
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf
Learning Word Reorderings for Hierarchical Phrase-based Statistical Machine Translation
Jingyi Zhang | Masao Utiyama | Eiichro Sumita | Hai Zhao
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

2014

pdf
Learning Hierarchical Translation Spans
Jingyi Zhang | Masao Utiyama | Eiichiro Sumita | Hai Zhao
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2013

pdf
Vietnamese to Chinese Machine Translation via Chinese Character as Pivot
Hai Zhao | Tianjiao Yin | Jingyi Zhang
Proceedings of the 27th Pacific Asia Conference on Language, Information, and Computation (PACLIC 27)