2022
pdf
abs
The HW-TSC’s Offline Speech Translation System for IWSLT 2022 Evaluation
Yinglu Li
|
Minghan Wang
|
Jiaxin Guo
|
Xiaosong Qiao
|
Yuxia Wang
|
Daimeng Wei
|
Chang Su
|
Yimeng Chen
|
Min Zhang
|
Shimin Tao
|
Hao Yang
|
Ying Qin
Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)
This paper describes the HW-TSC’s designation of the Offline Speech Translation System submitted for IWSLT 2022 Evaluation. We explored both cascade and end-to-end system on three language tracks (en-de, en-zh and en-ja), and we chose the cascade one as our primary submission. For the automatic speech recognition (ASR) model of cascade system, there are three ASR models including Conformer, S2T-Transformer and U2 trained on the mixture of five datasets. During inference, transcripts are generated with the help of domain controlled generation strategy. Context-aware reranking and ensemble based anti-interference strategy are proposed to produce better ASR outputs. For machine translation part, we pretrained three translation models on WMT21 dataset and fine-tuned them on in-domain corpora. Our cascade system shows competitive performance than the known offline systems in the industry and academia.
pdf
abs
The HW-TSC’s Simultaneous Speech Translation System for IWSLT 2022 Evaluation
Minghan Wang
|
Jiaxin Guo
|
Yinglu Li
|
Xiaosong Qiao
|
Yuxia Wang
|
Zongyao Li
|
Chang Su
|
Yimeng Chen
|
Min Zhang
|
Shimin Tao
|
Hao Yang
|
Ying Qin
Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)
This paper presents our work in the participation of IWSLT 2022 simultaneous speech translation evaluation. For the track of text-to-text (T2T), we participate in three language pairs and build wait-k based simultaneous MT (SimulMT) model for the task. The model was pretrained on WMT21 news corpora, and was further improved with in-domain fine-tuning and self-training. For the speech-to-text (S2T) track, we designed both cascade and end-to-end form in three language pairs. The cascade system is composed of a chunking-based streaming ASR model and the SimulMT model used in the T2T track. The end-to-end system is a simultaneous speech translation (SimulST) model based on wait-k strategy, which is directly trained on a synthetic corpus produced by translating all texts of ASR corpora into specific target language with an offline MT model. It also contains a heuristic sentence breaking strategy, preventing it from finishing the translation before the the end of the speech. We evaluate our systems on the MUST-C tst-COMMON dataset and show that the end-to-end system is competitive to the cascade one. Meanwhile, we also demonstrate that the SimulMT model can be efficiently optimized by these approaches, resulting in the improvements of 1-2 BLEU points.
pdf
abs
The HW-TSC’s Speech to Speech Translation System for IWSLT 2022 Evaluation
Jiaxin Guo
|
Yinglu Li
|
Minghan Wang
|
Xiaosong Qiao
|
Yuxia Wang
|
Hengchao Shang
|
Chang Su
|
Yimeng Chen
|
Min Zhang
|
Shimin Tao
|
Hao Yang
|
Ying Qin
Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)
The paper presents the HW-TSC’s pipeline and results of Offline Speech to Speech Translation for IWSLT 2022. We design a cascade system consisted of an ASR model, machine translation model and TTS model to convert the speech from one language into another language(en-de). For the ASR part, we find that better performance can be obtained by ensembling multiple heterogeneous ASR models and performing reranking on beam candidates. And we find that the combination of context-aware reranking strategy and MT model fine-tuned on the in-domain dataset is helpful to improve the performance. Because it can mitigate the problem that the inconsistency in transcripts caused by the lack of context. Finally, we use VITS model provided officially to reproduce audio files from the translation hypothesis.
pdf
abs
Diformer: Directional Transformer for Neural Machine Translation
Minghan Wang
|
Jiaxin Guo
|
Yuxia Wang
|
Daimeng Wei
|
Hengchao Shang
|
Yinglu Li
|
Chang Su
|
Yimeng Chen
|
Min Zhang
|
Shimin Tao
|
Hao Yang
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation
Autoregressive (AR) and Non-autoregressive (NAR) models have their own superiority on the performance and latency, combining them into one model may take advantage of both. Current combination frameworks focus more on the integration of multiple decoding paradigms with a unified generative model, e.g. Masked Language Model. However, the generalization can be harmful on the performance due to the gap between training objective and inference. In this paper, we aim to close the gap by preserving the original objective of AR and NAR under a unified framework. Specifically, we propose the Directional Transformer (Diformer) by jointly modelling AR and NAR into three generation directions (left-to-right, right-to-left and straight) with a newly introduced direction variable, which works by controlling the prediction of each token to have specific dependencies under that direction. The unification achieved by direction successfully preserves the original dependency assumption used in AR and NAR, retaining both generalization and performance. Experiments on 4 WMT benchmarks demonstrate that Diformer outperforms current united-modelling works with more than 1.5 BLEU points for both AR and NAR decoding, and is also competitive to the state-of-the-art independent AR and NAR models.
pdf
abs
Capture Human Disagreement Distributions by Calibrated Networks for Natural Language Inference
Yuxia Wang
|
Minghan Wang
|
Yimeng Chen
|
Shimin Tao
|
Jiaxin Guo
|
Chang Su
|
Min Zhang
|
Hao Yang
Findings of the Association for Computational Linguistics: ACL 2022
Natural Language Inference (NLI) datasets contain examples with highly ambiguous labels due to its subjectivity. Several recent efforts have been made to acknowledge and embrace the existence of ambiguity, and explore how to capture the human disagreement distribution. In contrast with directly learning from gold ambiguity labels, relying on special resource, we argue that the model has naturally captured the human ambiguity distribution as long as it’s calibrated, i.e. the predictive probability can reflect the true correctness likelihood. Our experiments show that when model is well-calibrated, either by label smoothing or temperature scaling, it can obtain competitive performance as prior work, on both divergence scores between predictive probability and the true human opinion distribution, and the accuracy. This reveals the overhead of collecting gold ambiguity labels can be cut, by broadly solving how to calibrate the NLI network.
2021
pdf
abs
Make the Blind Translator See The World: A Novel Transfer Learning Solution for Multimodal Machine Translation
Minghan Wang
|
Jiaxin Guo
|
Yimeng Chen
|
Chang Su
|
Min Zhang
|
Shimin Tao
|
Hao Yang
Proceedings of Machine Translation Summit XVIII: Research Track
Based on large-scale pretrained networks and the liability to be easily overfitting with limited labelled training data of multimodal translation (MMT) is a critical issue in MMT. To this end and we propose a transfer learning solution. Specifically and 1) A vanilla Transformer is pre-trained on massive bilingual text-only corpus to obtain prior knowledge; 2) A multimodal Transformer named VLTransformer is proposed with several components incorporated visual contexts; and 3) The parameters of VLTransformer are initialized with the pre-trained vanilla Transformer and then being fine-tuned on MMT tasks with a newly proposed method named cross-modal masking which forces the model to learn from both modalities. We evaluated on the Multi30k en-de and en-fr dataset and improving up to 8% BLEU score compared with the SOTA performance. The experimental result demonstrates that performing transfer learning with monomodal pre-trained NMT model on multimodal NMT tasks can obtain considerable boosts.
pdf
abs
How Length Prediction Influence the Performance of Non-Autoregressive Translation?
Minghan Wang
|
Guo Jiaxin
|
Yuxia Wang
|
Yimeng Chen
|
Su Chang
|
Hengchao Shang
|
Min Zhang
|
Shimin Tao
|
Hao Yang
Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP
Length prediction is a special task in a series of NAT models where target length has to be determined before generation. However, the performance of length prediction and its influence on translation quality has seldom been discussed. In this paper, we present comprehensive analyses on length prediction task of NAT, aiming to find the factors that influence performance, as well as how it associates with translation quality. We mainly perform experiments based on Conditional Masked Language Model (CMLM) (Ghazvininejad et al., 2019), a representative NAT model, and evaluate it on two language pairs, En-De and En-Ro. We draw two conclusions: 1) The performance of length prediction is mainly influenced by properties of language pairs such as alignment pattern, word order or intrinsic length ratio, and is also affected by the usage of knowledge distilled data. 2) There is a positive correlation between the performance of the length prediction and the BLEU score.
pdf
abs
HI-CMLM: Improve CMLM with Hybrid Decoder Input
Minghan Wang
|
Guo Jiaxin
|
Yuxia Wang
|
Yimeng Chen
|
Su Chang
|
Daimeng Wei
|
Min Zhang
|
Shimin Tao
|
Hao Yang
Proceedings of the 14th International Conference on Natural Language Generation
Mask-predict CMLM (Ghazvininejad et al.,2019) has achieved stunning performance among non-autoregressive NMT models, but we find that the mechanism of predicting all of the target words only depending on the hidden state of [MASK] is not effective and efficient in initial iterations of refinement, resulting in ungrammatical repetitions and slow convergence. In this work, we mitigate this problem by combining copied source with embeddings of [MASK] in decoder. Notably. it’s not a straightforward copying that is shown to be useless, but a novel heuristic hybrid strategy — fence-mask. Experimental results show that it gains consistent boosts on both WMT14 En<->De and WMT16 En<->Ro corpus by 0.5 BLEU on average, and 1 BLEU for less-informative short sentences. This reveals that incorporating additional information by proper strategies is beneficial to improve CMLM, particularly translation quality of short texts and speeding up early-stage convergence.
pdf
abs
HW-TSC’s Participation at WMT 2021 Quality Estimation Shared Task
Yimeng Chen
|
Chang Su
|
Yingtao Zhang
|
Yuxia Wang
|
Xiang Geng
|
Hao Yang
|
Shimin Tao
|
Guo Jiaxin
|
Wang Minghan
|
Min Zhang
|
Yujia Liu
|
Shujian Huang
Proceedings of the Sixth Conference on Machine Translation
This paper presents our work in WMT 2021 Quality Estimation (QE) Shared Task. We participated in all of the three sub-tasks, including Sentence-Level Direct Assessment (DA) task, Word and Sentence-Level Post-editing Effort task and Critical Error Detection task, in all language pairs. Our systems employ the framework of Predictor-Estimator, concretely with a pre-trained XLM-Roberta as Predictor and task-specific classifier or regressor as Estimator. For all tasks, we improve our systems by incorporating post-edit sentence or additional high-quality translation sentence in the way of multitask learning or encoding it with predictors directly. Moreover, in zero-shot setting, our data augmentation strategy based on Monte-Carlo Dropout brings up significant improvement on DA sub-task. Notably, our submissions achieve remarkable results over all tasks.
2020
pdf
abs
HW-TSC’s Participation at WMT 2020 Automatic Post Editing Shared Task
Hao Yang
|
Minghan Wang
|
Daimeng Wei
|
Hengchao Shang
|
Jiaxin Guo
|
Zongyao Li
|
Lizhi Lei
|
Ying Qin
|
Shimin Tao
|
Shiliang Sun
|
Yimeng Chen
Proceedings of the Fifth Conference on Machine Translation
The paper presents the submission by HW-TSC in the WMT 2020 Automatic Post Editing Shared Task. We participate in the English-German and English-Chinese language pairs. Our system is built based on the Transformer pre-trained on WMT 2019 and WMT 2020 News Translation corpora, and fine-tuned on the APE corpus. Bottleneck Adapter Layers are integrated into the model to prevent over-fitting. We further collect external translations as the augmented MT candidates to improve the performance. The experiment demonstrates that pre-trained NMT models are effective when fine-tuning with the APE corpus of a limited size, and the performance can be further improved with external MT augmentation. Our system achieves competitive results on both directions in the final evaluation.
pdf
abs
HW-TSC’s Participation at WMT 2020 Quality Estimation Shared Task
Minghan Wang
|
Hao Yang
|
Hengchao Shang
|
Daimeng Wei
|
Jiaxin Guo
|
Lizhi Lei
|
Ying Qin
|
Shimin Tao
|
Shiliang Sun
|
Yimeng Chen
|
Liangyou Li
Proceedings of the Fifth Conference on Machine Translation
This paper presents our work in the WMT 2020 Word and Sentence-Level Post-Editing Quality Estimation (QE) Shared Task. Our system follows standard Predictor-Estimator architecture, with a pre-trained Transformer as the Predictor, and specific classifiers and regressors as Estimators. We integrate Bottleneck Adapter Layers in the Predictor to improve the transfer learning efficiency and prevent from over-fitting. At the same time, we jointly train the word- and sentence-level tasks with a unified model with multitask learning. Pseudo-PE assisted QE (PEAQE) is proposed, resulting in significant improvements on the performance. Our submissions achieve competitive result in word/sentence-level sub-tasks for both of En-De/Zh language pairs.