Danni Liu


Effective combination of pretrained models - KIT@IWSLT2022
Ngoc-Quan Pham | Tuan Nam Nguyen | Thai-Binh Nguyen | Danni Liu | Carlos Mullov | Jan Niehues | Alexander Waibel
Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)

Pretrained models in acoustic and textual modalities can potentially improve speech translation for both Cascade and End-to-end approaches. In this evaluation, we aim at empirically looking for the answer by using the wav2vec, mBART50 and DeltaLM models to improve text and speech translation models. The experiments showed that the presence of these models together with an advanced audio segmentation method results in an improvement over the previous end-to-end system by up to 7 BLEU points. More importantly, the experiments showed that given enough data and modeling capacity to overcome the training difficulty, we can outperform even very competitive Cascade systems. In our experiments, this gap can be as large as 2.0 BLEU points, the same gap that the Cascade often led over the years.

CUNI-KIT System for Simultaneous Speech Translation Task at IWSLT 2022
Peter Polák | Ngoc-Quan Pham | Tuan Nam Nguyen | Danni Liu | Carlos Mullov | Jan Niehues | Ondřej Bojar | Alexander Waibel
Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)

In this paper, we describe our submission to the Simultaneous Speech Translation at IWSLT 2022. We explore strategies to utilize an offline model in a simultaneous setting without the need to modify the original model. In our experiments, we show that our onlinization algorithm is almost on par with the offline setting while being 3x faster than offline in terms of latency on the test set. We also show that the onlinized offline model outperforms the best IWSLT2021 simultaneous system in medium and high latency regimes and is almost on par in the low latency regime. We make our system publicly available.

Learning an Artificial Language for Knowledge-Sharing in Multilingual Translation
Danni Liu | Jan Niehues
Proceedings of the Seventh Conference on Machine Translation (WMT)

The cornerstone of multilingual neural translation is shared representations across languages.Given the theoretically infinite representation power of neural networks, semantically identical sentences are likely represented differently.While representing sentences in the continuous latent space ensures expressiveness, it introduces the risk of capturing of irrelevant features which hinders the learning of a common representation.In this work, we discretize the encoder output latent space of multilingual models by assigning encoder states to entries in a codebook,which in effect represents source sentences in a new artificial language.This discretization process not only offers a new way to interpret the otherwise black-box model representations,but, more importantly, gives potential for increasing robustness in unseen testing conditions.We validate our approach on large-scale experiments with realistic data volumes and domains.When tested in zero-shot conditions, our approach is competitive with two strong alternatives from the literature.We also use the learned artificial language to analyze model behavior, and discover that using a similar bridge language increases knowledge-sharing among the remaining languages.


Unsupervised Machine Translation On Dravidian Languages
Sai Koneru | Danni Liu | Jan Niehues
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages

Unsupervised Neural Machine translation (UNMT) is beneficial especially for under-resourced languages such as from the Dravidian family. They learn to translate between the source and target, relying solely on only monolingual corpora. However, UNMT systems fail in scenarios that occur often when dealing with low resource languages. Recent works have achieved state-of-the-art results by adding auxiliary parallel data with similar languages. In this work, we focus on unsupervised translation between English and Kannada by using limited amounts of auxiliary data between English and other Dravidian languages. We show that transliteration is essential in unsupervised translation between Dravidian languages, as they do not share a common writing system. We explore several model architectures that use the auxiliary data in order to maximize knowledge sharing and enable UNMT for dissimilar language pairs. We show from our experiments it is crucial for Kannada and reference languages to be similar. Further, we propose a method to measure language similarity to choose the most beneficial reference languages.

Maastricht University’s Multilingual Speech Translation System for IWSLT 2021
Danni Liu | Jan Niehues
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

This paper describes Maastricht University’s participation in the IWSLT 2021 multilingual speech translation track. The task in this track is to build multilingual speech translation systems in supervised and zero-shot directions. Our primary system is an end-to-end model that performs both speech transcription and translation. We observe that the joint training for the two tasks is complementary especially when the speech translation data is scarce. On the source and target side, we use data augmentation and pseudo-labels respectively to improve the performance of our systems. We also introduce an ensembling technique that consistently improves the quality of transcriptions and translations. The experiments show that the end-to-end system is competitive with its cascaded counterpart especially in zero-shot conditions.

Improving Zero-Shot Translation by Disentangling Positional Information
Danni Liu | Jan Niehues | James Cross | Francisco Guzmán | Xian Li
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Multilingual neural machine translation has shown the capability of directly translating between language pairs unseen in training, i.e. zero-shot translation. Despite being conceptually attractive, it often suffers from low output quality. The difficulty of generalizing to new translation directions suggests the model representations are highly specific to those language pairs seen in training. We demonstrate that a main factor causing the language-specific representations is the positional correspondence to input tokens. We show that this can be easily alleviated by removing residual connections in an encoder layer. With this modification, we gain up to 18.5 BLEU points on zero-shot translation while retaining quality on supervised directions. The improvements are particularly prominent between related languages, where our proposed model outperforms pivot-based translation. Moreover, our approach allows easy integration of new languages, which substantially expands translation coverage. By thorough inspections of the hidden layer outputs, we show that our approach indeed leads to more language-independent representations.

Maastricht University’s Large-Scale Multilingual Machine Translation System for WMT 2021
Danni Liu | Jan Niehues
Proceedings of the Sixth Conference on Machine Translation

We present our development of the multilingual machine translation system for the large-scale multilingual machine translation task at WMT 2021. Starting form the provided baseline system, we investigated several techniques to improve the translation quality on the target subset of languages. We were able to significantly improve the translation quality by adapting the system towards the target subset of languages and by generating synthetic data using the initial model. Techniques successfully applied in zero-shot multilingual machine translation (e.g. similarity regularizer) only had a minor effect on the final translation performance.


Adapting End-to-End Speech Recognition for Readable Subtitles
Danni Liu | Jan Niehues | Gerasimos Spanakis
Proceedings of the 17th International Conference on Spoken Language Translation

Automatic speech recognition (ASR) systems are primarily evaluated on transcription accuracy. However, in some use cases such as subtitling, verbatim transcription would reduce output readability given limited screen size and reading time. Therefore, this work focuses on ASR with output compression, a task challenging for supervised approaches due to the scarcity of training data. We first investigate a cascaded system, where an unsupervised compression model is used to post-edit the transcribed speech. We then compare several methods of end-to-end speech recognition under output length constraints. The experiments show that with limited data far less than needed for training a model from scratch, we can adapt a Transformer-based ASR model to incorporate both transcription and compression capabilities. Furthermore, the best performance in terms of WER and ROUGE scores is achieved by explicitly modeling the length constraints within the end-to-end ASR system.