2022
pdf
abs
MR-P: A Parallel Decoding Algorithm for Iterative Refinement Non-Autoregressive Translation
Hao Cheng
|
Zhihua Zhang
Findings of the Association for Computational Linguistics: ACL 2022
Non-autoregressive translation (NAT) predicts all the target tokens in parallel and significantly speeds up the inference process. The Conditional Masked Language Model (CMLM) is a strong baseline of NAT. It decodes with the Mask-Predict algorithm which iteratively refines the output. Most works about CMLM focus on the model structure and the training objective. However, the decoding algorithm is equally important. We propose a simple, effective, and easy-to-implement decoding algorithm that we call MaskRepeat-Predict (MR-P). The MR-P algorithm gives higher priority to consecutive repeated tokens when selecting tokens to mask for the next iteration and stops the iteration after target tokens converge. We conduct extensive experiments on six translation directions with varying data sizes. The results show that MR-P significantly improves the performance with the same model parameters. Specifically, we achieve a BLEU increase of 1.39 points in the WMT’14 En-De translation task.
pdf
abs
Con-NAT: Contrastive Non-autoregressive Neural Machine Translation
Hao Cheng
|
Zhihua Zhang
Findings of the Association for Computational Linguistics: EMNLP 2022
Inspired by the success of contrastive learning in natural language processing, we incorporate contrastive learning into the conditional masked language model which is extensively used in non-autoregressive neural machine translation (NAT). Accordingly, we propose a Contrastive Non-autoregressive Neural Machine Translation (Con-NAT) model. Con-NAT optimizes the similarity of several different representations of the same token in the same sentence. We propose two methods to obtain various representations: Contrastive Common Mask and Contrastive Dropout. Positive pairs are various different representations of the same token, while negative pairs are representations of different tokens. In the feature space, the model with contrastive loss pulls positive pairs together and pushes negative pairs away. We conduct extensive experiments on six translation directions with different data sizes. The results demonstrate that Con-NAT showed a consistent and significant improvement in fully and iterative NAT. Con-NAT is state-of-the-art on WMT’16 Ro-En (34.18 BLEU).
2021
pdf
abs
Multi-split Reversible Transformers Can Enhance Neural Machine Translation
Yuekai Zhao
|
Shuchang Zhou
|
Zhihua Zhang
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume
Large-scale transformers have been shown the state-of-the-art on neural machine translation. However, training these increasingly wider and deeper models could be tremendously memory intensive. We reduce the memory burden by employing the idea of reversible networks that a layer’s input can be reconstructed from its output. We design three types of multi-split based reversible transformers. We also devise a corresponding backpropagation algorithm, which does not need to store activations for most layers. Furthermore, we present two fine-tuning techniques: splits shuffle and self ensemble, to boost translation accuracy. Specifically, our best models surpass the vanilla transformer by at least 1.4 BLEU points in three datasets. Our large-scale reversible models achieve 30.0 BLEU in WMT’14 En-De and 43.5 BLEU in WMT’14 En-Fr, beating several very strong baselines with less than half of the training memory.
pdf
Memory-Efficient Differentiable Transformer Architecture Search
Yuekai Zhao
|
Li Dong
|
Yelong Shen
|
Zhihua Zhang
|
Furu Wei
|
Weizhu Chen
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021
2020
pdf
abs
Active Learning Approaches to Enhancing Neural Machine Translation
Yuekai Zhao
|
Haoran Zhang
|
Shuchang Zhou
|
Zhihua Zhang
Findings of the Association for Computational Linguistics: EMNLP 2020
Active learning is an efficient approach for mitigating data dependency when training neural machine translation (NMT) models. In this paper, we explore new training frameworks by incorporating active learning into various techniques such as transfer learning and iterative back-translation (IBT) under a limited human translation budget. We design a word frequency based acquisition function and combine it with a strong uncertainty based method. The combined method steadily outperforms all other acquisition functions in various scenarios. As far as we know, we are the first to do a large-scale study on actively training Transformer for NMT. Specifically, with a human translation budget of only 20% of the original parallel corpus, we manage to surpass Transformer trained on the entire parallel corpus in three language pairs.
pdf
abs
Train Once, and Decode As You Like
Chao Tian
|
Yifei Wang
|
Hao Cheng
|
Yijiang Lian
|
Zhihua Zhang
Proceedings of the 28th International Conference on Computational Linguistics
In this paper we propose a unified approach for supporting different generation manners of machine translation, including autoregressive, semi-autoregressive, and refinement-based non-autoregressive models. Our approach works by repeatedly selecting positions and generating tokens at these selected positions. After being trained once, our approach achieves better or competitive translation performance compared with some strong task-specific baseline models in all the settings. This generalization ability benefits mainly from the new training objective that we propose. We validate our approach on the WMT’14 English-German and IWSLT’14 German-English translation tasks. The experimental results are encouraging.
2016
pdf
ECNU at SemEval-2016 Task 4: An Empirical Investigation of Traditional NLP Features and Word Embedding Features for Sentence-level and Topic-level Sentiment Analysis in Twitter
Yunxiao Zhou
|
Zhihua Zhang
|
Man Lan
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)
pdf
ECNU at SemEval-2016 Task 5: Extracting Effective Features from Relevant Fragments in Sentence for Aspect-Based Sentiment Analysis in Reviews
Mengxiao Jiang
|
Zhihua Zhang
|
Man Lan
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)
pdf
ECNU at SemEval 2016 Task 6: Relevant or Not? Supportive or Not? A Two-step Learning System for Automatic Detecting Stance in Tweets
Zhihua Zhang
|
Man Lan
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)
pdf
ECNU at SemEval-2016 Task 7: An Enhanced Supervised Learning Method for Lexicon Sentiment Intensity Ranking
Feixiang Wang
|
Zhihua Zhang
|
Man Lan
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)
2015
pdf
ECNU: Multi-level Sentiment Analysis on Twitter Using Traditional Linguistic Features and Word Embedding Features
Zhihua Zhang
|
Guoshun Wu
|
Man Lan
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)
pdf
ECNU: Extracting Effective Features from Multiple Sequential Sentences for Target-dependent Sentiment Analysis in Reviews
Zhihua Zhang
|
Man Lan
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)
2014
pdf
ECNU: A Combination Method and Multiple Features for Aspect Extraction and Sentiment Polarity Classification
Fangxi Zhang
|
Zhihua Zhang
|
Man Lan
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)