Jorge Civera


2019

pdf bib
The MLLP-UPV Supervised Machine Translation Systems for WMT19 News Translation Task
Javier Iranzo-Sánchez | Gonçal Garcés Díaz-Munío | Jorge Civera | Alfons Juan
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

This paper describes the participation of the MLLP research group of the Universitat Politècnica de València in the WMT 2019 News Translation Shared Task. In this edition, we have submitted systems for the German ↔ English and German ↔ French language pairs, participating in both directions of each pair. Our submitted systems, based on the Transformer architecture, make ample use of data filtering, synthetic data and domain adaptation through fine-tuning.

pdf bib
The MLLP-UPV Spanish-Portuguese and Portuguese-Spanish Machine Translation Systems for WMT19 Similar Language Translation Task
Pau Baquero-Arnal | Javier Iranzo-Sánchez | Jorge Civera | Alfons Juan
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)

This paper describes the participation of the MLLP research group of the Universitat Politècnica de València in the WMT 2019 Similar Language Translation Shared Task. We have submitted systems for the Portuguese ↔ Spanish language pair, in both directions. We have submitted systems based on the Transformer architecture as well as an in development novel architecture which we have called 2D alternating RNN. We have carried out domain adaptation through fine-tuning.

2018

pdf bib
The MLLP-UPV German-English Machine Translation System for WMT18
Javier Iranzo-Sánchez | Pau Baquero-Arnal | Gonçal V. Garcés Díaz-Munío | Adrià Martínez-Villaronga | Jorge Civera | Alfons Juan
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This paper describes the statistical machine translation system built by the MLLP research group of Universitat Politècnica de València for the German→English news translation shared task of the EMNLP 2018 Third Conference on Machine Translation (WMT18). We used an ensemble of Transformer architecture–based neural machine translation systems. To train our system under “constrained” conditions, we filtered the provided parallel data with a scoring technique using character-based language models, and we added parallel data based on synthetic source sentences generated from the provided monolingual corpora.

2010

pdf bib
Saturnalia: A Latin-Catalan Parallel Corpus for Statistical MT
Jesús González-Rubio | Jorge Civera | Alfons Juan | Francisco Casacuberta
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Currently, a great effort is being carried out in the digitalisation of large historical document collections for preservation purposes. The documents in these collections are usually written in ancient languages, such as Latin or Greek, which limits the access of the general public to their content due to the language barrier. Therefore, digital libraries aim not only at storing raw images of digitalised documents, but also to annotate them with their corresponding text transcriptions and translations into modern languages. Unfortunately, ancient languages have at their disposal scarce electronic resources to be exploited by natural language processing techniques. This paper describes the compilation process of a novel Latin-Catalan parallel corpus as a new task for statistical machine translation (SMT). Preliminary experimental results are also reported using a state-of-the-art phrase-based SMT system. The results presented in this work reveal the complexity of the task and its challenging, but interesting nature for future development.

2009

pdf bib
Statistical Approaches to Computer-Assisted Translation
Sergio Barrachina | Oliver Bender | Francisco Casacuberta | Jorge Civera | Elsa Cubel | Shahram Khadivi | Antonio Lagarda | Hermann Ney | Jesús Tomás | Enrique Vidal | Juan-Miguel Vilar
Computational Linguistics, Volume 35, Number 1, March 2009

2008

pdf bib
Bilingual Text Classification using the IBM 1 Translation Model
Jorge Civera | Alfons Juan-Císcar
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Manual categorisation of documents is a time-consuming task that has been significantly alleviated with the deployment of automatic and machine-aided text categorisation systems. However, the proliferation of multilingual documentation has become a common phenomenon in many international organisations, while most of the current systems have focused on the categorisation of monolingual text. It has been recently shown that the inherent redundancy in bilingual documents can be effectively exploited by relatively simple, bilingual naive Bayes (multinomial) models. In this work, we present a refined version of these models in which this redundancy is explicitly captured by a combination of a unigram (multinomial) model and the well-known IBM 1 translation model. The proposed model is evaluated on two bilingual classification tasks and compared to previous work.

pdf bib
Improving Interactive Machine Translation via Mouse Actions
Germán Sanchis-Trilles | Daniel Ortiz-Martínez | Jorge Civera | Francisco Casacuberta | Enrique Vidal | Hieu Hoang
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

2007

pdf bib
Domain Adaptation in Statistical Machine Translation with Mixture Modelling
Jorge Civera | Alfons Juan
Proceedings of the Second Workshop on Statistical Machine Translation

2006

pdf bib
Bilingual Machine-Aided Indexing
Jorge Civera | Alfons Juan
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The proliferation of multilingual documentation in our Information Society has become a common phenomenon. This documentation is usually categorised by hand, entailing a time-consuming and arduous burden. This is particularly true in the case of keyword assignment, in which a list of keywords (descriptors) from a controlled vocabulary (thesaurus) is assigned to a document. A possible solution to alleviate this problem comes from the hand of the so-called Machine-Aided Indexing (MAI) systems. These systems work in cooperation with professional indexer by providing a initial list of descriptors from which those most appropiated will be selected. This way of proceeding increases the productivity and eases the task of indexers. In this paper, we propose a statistical text classification framework for bilingual documentation, from which we derive two novel bilingual classifiers based on the naive combination of monolingual classifiers. We report preliminary results on the multilingual corpus Acquis Communautaire (AC) that demonstrates the suitability of the proposed classifiers as the backend of a fully-working MAI system.

pdf bib
A Computer-Assisted Translation Tool based on Finite-State Technology
Jorge Civera | Antonio L. Lagarda | Elsa Cubel | Francisco Casacuberta | Enrique Vidal | Juan M. Vilar | Sergio Barrachina
Proceedings of the 11th Annual conference of the European Association for Machine Translation

pdf bib
Mixtures of IBM Model 2
Jorge Civera | Alfons Juan
Proceedings of the 11th Annual conference of the European Association for Machine Translation

2004

pdf bib
From Machine Translation to Computer Assisted Translation using Finite-State Models
Jorge Civera | Elsa Cubel | Antonio L. Lagarda | David Picó | Jorge González | Enrique Vidal | Francisco Casacuberta | Juan M. Vilar | Sergio Barrachina
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing