Manuel Herranz


2022

pdf bib
English-Russian Data Augmentation for Neural Machine Translation
Nikita Teslenko Grygoryev | Mercedes Garcia Martinez | Francisco Casacuberta Nolla | Amando Estela Pastor | Manuel Herranz
Proceedings of the 15th biennial conference of the Association for Machine Translation in the Americas (Workshop 2: Corpus Generation and Corpus Augmentation for Machine Translation)

Data Augmentation (DA) refers to strategies for increasing the diversity of training examples without explicitly collecting new data manually. We have used neural networks and linguistic resources for the automatic generation of text in Russian. The system generates new texts using information from embeddings trained with a huge amount of data in neural language models. Data from the public domain have been used for experiments. The generation of these texts increases the corpus used to train models for NLP tasks, such as machine translation. Finally, an analysis of the results obtained evaluating the quality of generated texts has been carried out and those texts have been added to the training process of Neural Machine Translation (NMT) models. In order to evaluate the quality of the NMT models, firstly, these models have been compared performing a quantitative analysis by means of several standard automatic metrics used in machine translation, and measuring the time spent and the amount of text generated for a good use in the language industry. Secondly, NMT models have been compared through a qualitative analysis, where generated examples of translation have been exposed and compared with each other. Using our DA method, we achieve better results than a baseline model by fine tuning NMT systems with the newly generated datasets.

pdf
MAPA Project: Ready-to-Go Open-Source Datasets and Deep Learning Technology to Remove Identifying Information from Text Documents
Victoria Arranz | Khalid Choukri | Montse Cuadros | Aitor García Pablos | Lucie Gianola | Cyril Grouin | Manuel Herranz | Patrick Paroubek | Pierre Zweigenbaum
Proceedings of the Workshop on Ethical and Legal Issues in Human Language Technologies and Multilingual De-Identification of Sensitive Data In Language Resources within the 13th Language Resources and Evaluation Conference

pdf
Europeana Translate: Providing multilingual access to digital cultural heritage
Eirini Kaldeli | Mercedes García-Martínez | Antoine Isaac | Paolo Sebastiano Scalia | Arne Stabenau | Iván Lena Almor | Carmen Grau Lacal | Martín Barroso Ordóñez | Amando Estela | Manuel Herranz
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

Europeana Translate is a project funded under the Connecting European Facility with the objective to take advantage of state-of-the-art machine translation in order to increase the multilinguality of resources in the cultural heritage domain

2021


Neural Translation for European Union (NTEU)
Mercedes García-Martínez | Laurent Bié | Aleix Cerdà | Amando Estela | Manuel Herranz | Rihards Krišlauks | Maite Melero | Tony O’Dowd | Sinead O’Gorman | Marcis Pinnis | Artūrs Stafanovič | Riccardo Superbo | Artūrs Vasiļevskis
Proceedings of Machine Translation Summit XVIII: Users and Providers Track

The Neural Translation for the European Union (NTEU) engine farm enables direct machine translation for all 24 official languages of the European Union without the necessity to use a high-resourced language as a pivot. This amounts to a total of 552 translation engines for all combinations of the 24 languages. We have collected parallel data for all the language combinations publickly shared in elrc-share.eu. The translation engines have been customized to domain,for the use of the European public administrations. The delivered engines will be published in the European Language Grid. In addition to the usual automatic metrics, all the engines have been evaluated by humans based on the direct assessment methodology. For this purpose, we built an open-source platform called MTET The evaluation shows that most of the engines reach high quality and get better scores compared to an external machine translation service in a blind evaluation setup.

2020

pdf
Eco.pangeamt: Industrializing Neural MT
Mercedes García-Martínez | Manuel Herranz | Amando Estela | Ángela Franco | Laurent Bié
Proceedings of the 1st International Workshop on Language Technology Platforms

Eco is Pangeanic’s customer portal for generic or specialized translation services (machine translation and post-editing, generic API MT and custom API MT). Users can request the processing (translation) of files in different formats. Moreover, a client user can manage the engines and models allowing their cloning and retraining.

pdf
A User Study of the Incremental Learning in NMT
Miguel Domingo | Mercedes García-Martínez | Álvaro Peris | Alexandre Helle | Amando Estela | Laurent Bié | Francisco Casacuberta | Manuel Herranz
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

In the translation industry, human experts usually supervise and post-edit machine translation hypotheses. Adaptive neural machine translation systems, able to incrementally update the underlying models under an online learning regime, have been proven to be useful to improve the efficiency of this workflow. However, this incremental adaptation is somewhat unstable, and it may lead to undesirable side effects. One of them is the sporadic appearance of made-up words, as a byproduct of an erroneous application of subword segmentation techniques. In this work, we extend previous studies on on-the-fly adaptation of neural machine translation systems. We perform a user study involving professional, experienced post-editors, delving deeper on the aforementioned problems. Results show that adaptive systems were able to learn how to generate the correct translation for task-specific terms, resulting in an improvement of the user’s productivity. We also observed a close similitude, in terms of morphology, between made-up words and the words that were expected.

pdf
The Multilingual Anonymisation Toolkit for Public Administrations (MAPA) Project
Ēriks Ajausks | Victoria Arranz | Laurent Bié | Aleix Cerdà-i-Cucó | Khalid Choukri | Montse Cuadros | Hans Degroote | Amando Estela | Thierry Etchegoyhen | Mercedes García-Martínez | Aitor García-Pablos | Manuel Herranz | Alejandro Kohan | Maite Melero | Mike Rosner | Roberts Rozis | Patrick Paroubek | Artūrs Vasiļevskis | Pierre Zweigenbaum
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

We describe the MAPA project, funded under the Connecting Europe Facility programme, whose goal is the development of an open-source de-identification toolkit for all official European Union languages. It will be developed since January 2020 until December 2021.

pdf
Neural Translation for the European Union (NTEU) Project
Laurent Bié | Aleix Cerdà-i-Cucó | Hans Degroote | Amando Estela | Mercedes García-Martínez | Manuel Herranz | Alejandro Kohan | Maite Melero | Tony O’Dowd | Sinéad O’Gorman | Mārcis Pinnis | Roberts Rozis | Riccardo Superbo | Artūrs Vasiļevskis
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

The Neural Translation for the European Union (NTEU) project aims to build a neural engine farm with all European official language combinations for eTranslation, without the necessity to use a high-resourced language as a pivot. NTEU started in September 2019 and will run until August 2021.

2019

pdf
NEC TM Data Project
Alexandre Helle | Manuel Herranz
Proceedings of Machine Translation Summit XVII: Translator, Project and User Tracks

pdf
iADAATPA Project: Pangeanic use cases
Mercedes García-Martínez | Amando Estela | Laurent Bié | Alexandre Helle | Manuel Herranz
Proceedings of Machine Translation Summit XVII: Translator, Project and User Tracks

pdf
Large-scale Machine Translation Evaluation of the iADAATPA Project
Sheila Castilho | Natália Resende | Federico Gaspari | Andy Way | Tony O’Dowd | Marek Mazur | Manuel Herranz | Alex Helle | Gema Ramírez-Sánchez | Víctor Sánchez-Cartagena | Mārcis Pinnis | Valters Šics
Proceedings of Machine Translation Summit XVII: Translator, Project and User Tracks

pdf
Incremental Adaptation of NMT for Professional Post-editors: A User Study
Miguel Domingo | Mercedes García-Martínez | Álvaro Peris | Alexandre Helle | Amando Estela | Laurent Bié | Francisco Casacuberta | Manuel Herranz
Proceedings of Machine Translation Summit XVII: Translator, Project and User Tracks

2016

pdf
PangeaMT v 3 – customise your own machine translation environment
Alexandre Helle | Manuel Herranz
Proceedings of the 19th Annual Conference of the European Association for Machine Translation: Projects/Products

2015

pdf bib
The EXPERT project: Advancing the state of the art in hybrid translation technologies
Constantin Orasan | Alessandro Cattelan | Gloria Corpas Pastor | Josef van Genabith | Manuel Herranz | Juan José Arevalillo | Qun Liu | Khalil Sima’an | Lucia Specia
Proceedings of Translating and the Computer 37

2013

pdf
Pangeanic in the EXPERT Project: Exploiting Empirical appRoaches to Translation
Manuel Herranz | Alex Helle | Elia Yuste | Ruslan Mitkov | Lucia Specia
Proceedings of Machine Translation Summit XIV: European projects