Adrià Martínez-Villaronga


2018

pdf bib
The MLLP-UPV German-English Machine Translation System for WMT18
Javier Iranzo-Sánchez | Pau Baquero-Arnal | Gonçal V. Garcés Díaz-Munío | Adrià Martínez-Villaronga | Jorge Civera | Alfons Juan
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This paper describes the statistical machine translation system built by the MLLP research group of Universitat Politècnica de València for the German→English news translation shared task of the EMNLP 2018 Third Conference on Machine Translation (WMT18). We used an ensemble of Transformer architecture–based neural machine translation systems. To train our system under “constrained” conditions, we filtered the provided parallel data with a scoring technique using character-based language models, and we added parallel data based on synthetic source sentences generated from the provided monolingual corpora.

pdf bib
Neural Speech Translation at AppTek
Evgeny Matusov | Patrick Wilken | Parnia Bahar | Julian Schamper | Pavel Golik | Albert Zeyer | Joan Albert Silvestre-Cerda | Adrià Martínez-Villaronga | Hendrik Pesch | Jan-Thorsten Peter
Proceedings of the 15th International Conference on Spoken Language Translation

This work describes AppTek’s speech translation pipeline that includes strong state-of-the-art automatic speech recognition (ASR) and neural machine translation (NMT) components. We show how these components can be tightly coupled by encoding ASR confusion networks, as well as ASR-like noise adaptation, vocabulary normalization, and implicit punctuation prediction during translation. In another experimental setup, we propose a direct speech translation approach that can be scaled to translation tasks with large amounts of text-only parallel training data but a limited number of hours of recorded and human-translated speech.

2014

pdf bib
Comparison of data selection techniques for the translation of video lectures
Joern Wuebker | Hermann Ney | Adrià Martínez-Villaronga | Adrià Giménez | Alfons Juan | Christophe Servan | Marc Dymetman | Shachar Mirkin
Proceedings of the 11th Conference of the Association for Machine Translation in the Americas: MT Researchers Track

For the task of online translation of scientific video lectures, using huge models is not possible. In order to get smaller and efficient models, we perform data selection. In this paper, we perform a qualitative and quantitative comparison of several data selection techniques, based on cross-entropy and infrequent n-gram criteria. In terms of BLEU, a combination of translation and language model cross-entropy achieves the most stable results. As another important criterion for measuring translation quality in our application, we identify the number of out-of-vocabulary words. Here, infrequent n-gram recovery shows superior performance. Finally, we combine the two selection techniques in order to benefit from both their strengths.