Germán Sanchis-Trilles
Also published as: Germán Sanchis, Germán Sanchis Trilles, German Sanchis-Trilles
2019
Filtering of Noisy Parallel Corpora Based on Hypothesis Generation
Zuzanna Parcheta | Germán Sanchis-Trilles | Francisco Casacuberta
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)
Zuzanna Parcheta | Germán Sanchis-Trilles | Francisco Casacuberta
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)
The filtering task of noisy parallel corpora in WMT2019 aims to challenge participants to create filtering methods to be useful for training machine translation systems. In this work, we introduce a noisy parallel corpora filtering system based on generating hypotheses by means of a translation model. We train translation models in both language pairs: Nepali–English and Sinhala–English using provided parallel corpora. We select the training subset for three language pairs (Nepali, Sinhala and Hindi to English) jointly using bilingual cross-entropy selection to create the best possible translation model for both language pairs. Once the translation models are trained, we translate the noisy corpora and generate a hypothesis for each sentence pair. We compute the smoothed BLEU score between the target sentence and generated hypothesis. In addition, we apply several rules to discard very noisy or inadequate sentences which can lower the translation score. These heuristics are based on sentence length, source and target similarity and source language detection. We compare our results with the baseline published on the shared task website, which uses the Zipporah model, over which we achieve significant improvements in one of the conditions in the shared task. The designed filtering system is domain independent and all experiments are conducted using neural machine translation.
2018
Creating the best development corpus for Statistical Machine Translation systems
Mara Chinea-Rios | Germán Sanchis-Trilles | Francisco Casacuberta
Proceedings of the 21st Annual Conference of the European Association for Machine Translation
Mara Chinea-Rios | Germán Sanchis-Trilles | Francisco Casacuberta
Proceedings of the 21st Annual Conference of the European Association for Machine Translation
We propose and study three different novel approaches for tackling the problem of development set selection in Statistical Machine Translation. We focus on a scenario where a machine translation system is leveraged for translating a specific test set, without further data from the domain at hand. Such test set stems from a real application of machine translation, where the texts of a specific e-commerce were to be translated. For developing our development-set selection techniques, we first conducted experiments in a controlled scenario, where labelled data from different domains was available, and evaluated the techniques both with classification and translation quality metrics. Then, the bestperforming techniques were evaluated on the e-commerce data at hand, yielding consistent improvements across two language directions.
Data selection for NMT using Infrequent n-gram Recovery
Zuzanna Parcheta | Germán Sanchis-Trilles | Francisco Casacuberta
Proceedings of the 21st Annual Conference of the European Association for Machine Translation
Zuzanna Parcheta | Germán Sanchis-Trilles | Francisco Casacuberta
Proceedings of the 21st Annual Conference of the European Association for Machine Translation
Neural Machine Translation (NMT) has achieved promising results comparable with Phrase-Based Statistical Machine Translation (PBSMT). However, to train a neural translation engine, much more powerful machines are required than those required to develop translation engines based on PBSMT. One solution to reduce the training cost of NMT systems is the reduction of the training corpus through data selection (DS) techniques. There are many DS techniques applied in PBSMT which bring good results. In this work, we show that the data selection technique based on infrequent n-gram occurrence described in (Gasco ́ et al., 2012) commonly used for PBSMT systems also works well for NMT systems. We focus our work on selecting data according to specific corpora using the previously mentioned technique. The specific-domain corpora used for our experiments are IT domain and medical domain. The DS technique significantly reduces the execution time required to train the model between 87% and 93%. Also, it improves translation quality by up to 2.8 BLEU points. The improvements are obtained with just a small fraction of the data that accounts for between 6% and 20% of the total data.
Implementing a neural machine translation engine for mobile devices: the Lingvanex use case
Zuzanna Parcheta | Germán Sanchis-Trilles | Aliaksei Rudak | Siarhei Bratchenia
Proceedings of the 21st Annual Conference of the European Association for Machine Translation
Zuzanna Parcheta | Germán Sanchis-Trilles | Aliaksei Rudak | Siarhei Bratchenia
Proceedings of the 21st Annual Conference of the European Association for Machine Translation
In this paper, we present the challenge entailed by implementing a mobile version of a neural machine translation system, where the goal is to maximise translation quality while minimising model size. We explain the whole process of implementing the translation engine on an English–Spanish example and we describe all the difficulties found and the solutions implemented. The main techniques used in this work are data selection by means of Infrequent n-gram Recovery, appending a special word at the end of each sentence, and generating additional samples without the final punctuation marks. The last two techniques were devised with the purpose of achieving a translation model that generates sentences without the final full stop, or other punctuation marks. Also, in this work, the Infrequent n-gram Recovery was used for the first time to create a new corpus, and not enlarge the in-domain dataset. Finally, we get a small size model with quality good enough to serve for daily use.
2014
Integrating online and active learning in a computer-assisted translation workbench
Vicent Alabau | Jesús González-Rubio | Daniel Ortiz-Martínez | Germán Sanchis-Trilles | Francisco Casacuberta | Mercedes García-Martínez | Bartolomé Mesa-Lao | Dan Cheung Petersen | Barbara Dragsted | Michael Carl
Workshop on interactive and adaptive machine translation
Vicent Alabau | Jesús González-Rubio | Daniel Ortiz-Martínez | Germán Sanchis-Trilles | Francisco Casacuberta | Mercedes García-Martínez | Bartolomé Mesa-Lao | Dan Cheung Petersen | Barbara Dragsted | Michael Carl
Workshop on interactive and adaptive machine translation
This paper describes a pilot study with a computed-assisted translation workbench aiming at testing the integration of online and active learning features. We investigate the effect of these features on translation productivity, using interactive translation prediction (ITP) as a baseline. User activity data were collected from five beta testers using key-logging and eye-tracking. User feedback was also collected at the end of the experiments in the form of retrospective think-aloud protocols. We found that OL performs better than ITP, especially in terms of translation speed. In addition, AL provides better translation quality than ITP for the same levels of user effort. We plan to incorporate these features in the final version of the workbench.
Efficient wordgraph for interactive translation prediction
Germán Sanchis-Trilles | Daniel Ortiz-Martínez | Francisco Casacuberta
Proceedings of the 17th Annual Conference of the European Association for Machine Translation
Germán Sanchis-Trilles | Daniel Ortiz-Martínez | Francisco Casacuberta
Proceedings of the 17th Annual Conference of the European Association for Machine Translation
CASMACAT: A Computer-assisted Translation Workbench
Vicent Alabau | Christian Buck | Michael Carl | Francisco Casacuberta | Mercedes García-Martínez | Ulrich Germann | Jesús González-Rubio | Robin Hill | Philipp Koehn | Luis Leiva | Bartolomé Mesa-Lao | Daniel Ortiz-Martínez | Herve Saint-Amand | Germán Sanchis Trilles | Chara Tsoukala
Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics
Vicent Alabau | Christian Buck | Michael Carl | Francisco Casacuberta | Mercedes García-Martínez | Ulrich Germann | Jesús González-Rubio | Robin Hill | Philipp Koehn | Luis Leiva | Bartolomé Mesa-Lao | Daniel Ortiz-Martínez | Herve Saint-Amand | Germán Sanchis Trilles | Chara Tsoukala
Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics
Evaluating the effects of interactivity in a post-editing workbench
Nancy Underwood | Bartolomé Mesa-Lao | Mercedes García Martínez | Michael Carl | Vicent Alabau | Jesús González-Rubio | Luis A. Leiva | Germán Sanchis-Trilles | Daniel Ortíz-Martínez | Francisco Casacuberta
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Nancy Underwood | Bartolomé Mesa-Lao | Mercedes García Martínez | Michael Carl | Vicent Alabau | Jesús González-Rubio | Luis A. Leiva | Germán Sanchis-Trilles | Daniel Ortíz-Martínez | Francisco Casacuberta
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
This paper describes the field trial and subsequent evaluation of a post-editing workbench which is currently under development in the EU-funded CasMaCat project. Based on user evaluations of the initial prototype of the workbench, this second prototype of the workbench includes a number of interactive features designed to improve productivity and user satisfaction. Using CasMaCat’s own facilities for logging keystrokes and eye tracking, data were collected from nine post-editors in a professional setting. These data were then used to investigate the effects of the interactive features on productivity, quality, user satisfaction and cognitive load as reflected in the post-editors gaze activity. These quantitative results are combined with the qualitative results derived from user questionnaires and interviews conducted with all the participants.
Online optimisation of log-linear weights in interactive machine translation
Mara Chinea Rios | Germán Sanchis-Trilles | Daniel Ortiz-Martínez | Francisco Casacuberta
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Mara Chinea Rios | Germán Sanchis-Trilles | Daniel Ortiz-Martínez | Francisco Casacuberta
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Whenever the quality provided by a machine translation system is not enough, a human expert is required to correct the sentences provided by the machine translation system. In such a setup, it is crucial that the system is able to learn from the errors that have already been corrected. In this paper, we analyse the applicability of discriminative ridge regression for learning the log-linear weights of a state-of-the-art machine translation system underlying an interactive machine translation framework, with encouraging results.
Proceedings of the EACL 2014 Workshop on Humans and Computer-assisted Translation
Ulrich Germann | Michael Carl | Philipp Koehn | Germán Sanchis-Trilles | Francisco Casacuberta | Robin Hill | Sharon O’Brien
Proceedings of the EACL 2014 Workshop on Humans and Computer-assisted Translation
Ulrich Germann | Michael Carl | Philipp Koehn | Germán Sanchis-Trilles | Francisco Casacuberta | Robin Hill | Sharon O’Brien
Proceedings of the EACL 2014 Workshop on Humans and Computer-assisted Translation
2013
User Evaluation of Advanced Interaction Features for a Computer-Assisted Translation Workbench
Vicente Alabau | Jesus Gonzalez-Rubio | Luis A. Leiva | Daniel Ortiz-Martínez | German Sanchis-Trilles | Francisco Casacuberta | Bartolomé Mesa-Lao | Ragnar Bonk | Michael Carl | Mercedes Garcia-Martinez
Proceedings of Machine Translation Summit XIV: User track
Vicente Alabau | Jesus Gonzalez-Rubio | Luis A. Leiva | Daniel Ortiz-Martínez | German Sanchis-Trilles | Francisco Casacuberta | Bartolomé Mesa-Lao | Ragnar Bonk | Michael Carl | Mercedes Garcia-Martinez
Proceedings of Machine Translation Summit XIV: User track
Advanced computer aided translation with a web-based workbench
Vicent Alabau | Ragnar Bonk | Christian Buck | Michael Carl | Francisco Casacuberta | Mercedes García-Martínez | Jesús González | Philipp Koehn | Luis Leiva | Bartolomé Mesa-Lao | Daniel Oriz | Hervé Saint-Amand | Germán Sanchis | Chara Tsiukala
Proceedings of the 2nd Workshop on Post-editing Technology and Practice
Vicent Alabau | Ragnar Bonk | Christian Buck | Michael Carl | Francisco Casacuberta | Mercedes García-Martínez | Jesús González | Philipp Koehn | Luis Leiva | Bartolomé Mesa-Lao | Daniel Oriz | Hervé Saint-Amand | Germán Sanchis | Chara Tsiukala
Proceedings of the 2nd Workshop on Post-editing Technology and Practice
2012
Does more data always yield better translations?
Guillem Gascó | Martha-Alicia Rocha | Germán Sanchis-Trilles | Jesús Andrés-Ferrer | Francisco Casacuberta
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Guillem Gascó | Martha-Alicia Rocha | Germán Sanchis-Trilles | Jesús Andrés-Ferrer | Francisco Casacuberta
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
2011
Bilingual segmentation for phrasetable pruning in Statistical Machine Translation
Germán Sanchis-Trilles | Daniel Ortiz-Martínez | Jesús González-Rubio | Jorge González
Proceedings of the 15th Annual Conference of the European Association for Machine Translation
Germán Sanchis-Trilles | Daniel Ortiz-Martínez | Jesús González-Rubio | Jorge González
Proceedings of the 15th Annual Conference of the European Association for Machine Translation
2010
A Deterministic Annealing-Based Training Algorithm For Statistical Machine Translation Models
Pascual Martínez Gómez | Kei Hashimoto | Yoshihiko Nankaku | Keiichi Tokuda | Germán Sanchis-Trilles
Proceedings of the 14th Annual Conference of the European Association for Machine Translation
Pascual Martínez Gómez | Kei Hashimoto | Yoshihiko Nankaku | Keiichi Tokuda | Germán Sanchis-Trilles
Proceedings of the 14th Annual Conference of the European Association for Machine Translation
Online Language Model adaptation via N-gram Mixtures for Statistical Machine Translation
Germán Sanchis-Trilles | Mauro Cettolo
Proceedings of the 14th Annual Conference of the European Association for Machine Translation
Germán Sanchis-Trilles | Mauro Cettolo
Proceedings of the 14th Annual Conference of the European Association for Machine Translation
ITI-UPV machine translation system for IWSLT 2010
Guillem Gascó | Vicent Alabau | Jesús-Andrés Ferrer | Jesús González-Rubio | Martha-Alicia Rocha | Germán Sanchis-Trilles | Francisco Casacuberta | Jorge González | Joan-Andreu Sánchez
Proceedings of the 7th International Workshop on Spoken Language Translation: Evaluation Campaign
Guillem Gascó | Vicent Alabau | Jesús-Andrés Ferrer | Jesús González-Rubio | Martha-Alicia Rocha | Germán Sanchis-Trilles | Francisco Casacuberta | Jorge González | Joan-Andreu Sánchez
Proceedings of the 7th International Workshop on Spoken Language Translation: Evaluation Campaign
This paper presents the submissions of the PRHLT group for the evaluation campaign of the International Workshop on Spoken Language Translation. We focus on the development of reliable translation systems between syntactically different languages (DIALOG task) and on the efficient training of SMT models in resource-rich scenarios (TALK task).
Log-linear weight optimisation via Bayesian Adaptation in Statistical Machine Translation
Germán Sanchis-Trilles | Francisco Casacuberta
Coling 2010: Posters
Germán Sanchis-Trilles | Francisco Casacuberta
Coling 2010: Posters
UPV-PRHLT English–Spanish System for WMT10
Germán Sanchis-Trilles | Jesús Andrés-Ferrer | Guillem Gascó | Jesús González-Rubio | Pascual Martínez-Gómez | Martha-Alicia Rocha | Joan-Andreu Sánchez | Francisco Casacuberta
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Germán Sanchis-Trilles | Jesús Andrés-Ferrer | Guillem Gascó | Jesús González-Rubio | Pascual Martínez-Gómez | Martha-Alicia Rocha | Joan-Andreu Sánchez | Francisco Casacuberta
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
UCH-UPV English–Spanish System for WMT10
Francisco Zamora-Martínez | Germán Sanchis-Trilles
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Francisco Zamora-Martínez | Germán Sanchis-Trilles
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
The UPV-PRHLT Combination System for WMT 2010
Jesús González-Rubio | Germán Sanchis-Trilles | Joan-Andreu Sánchez | Jesús Andrés-Ferrer | Guillem Gascó | Pascual Martínez-Gómez | Martha-Alicia Rocha | Francisco Casacuberta
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Jesús González-Rubio | Germán Sanchis-Trilles | Joan-Andreu Sánchez | Jesús Andrés-Ferrer | Guillem Gascó | Pascual Martínez-Gómez | Martha-Alicia Rocha | Francisco Casacuberta
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
2009
FBK at IWSLT 2009
Nicola Bertoldi | Arianna Bisazza | Mauro Cettolo | Germán Sanchis-Trilles | Marcello Federico
Proceedings of the 6th International Workshop on Spoken Language Translation: Evaluation Campaign
Nicola Bertoldi | Arianna Bisazza | Mauro Cettolo | Germán Sanchis-Trilles | Marcello Federico
Proceedings of the 6th International Workshop on Spoken Language Translation: Evaluation Campaign
This paper reports on the participation of FBK at the IWSLT 2009 Evaluation. This year we worked on the Arabic-English and Turkish-English BTEC tasks with a special effort on linguistic preprocessing techniques involving morphological segmentation. In addition, we investigated the adaptation problem in the development of systems for the Chinese-English and English-Chinese challenge tasks; in particular, we explored different ways for clustering training data into topic or dialog-specific subsets: by producing (and combining) smaller but more focused models, we intended to make better use of the available training data, with the ultimate purpose of improving translation quality.
Online language model adaptation for spoken dialog translation
Germán Sanchis-Trilles | Mauro Cettolo | Nicola Bertoldi | Marcello Federico
Proceedings of the 6th International Workshop on Spoken Language Translation: Papers
Germán Sanchis-Trilles | Mauro Cettolo | Nicola Bertoldi | Marcello Federico
Proceedings of the 6th International Workshop on Spoken Language Translation: Papers
This paper focuses on the problem of language model adaptation in the context of Chinese-English cross-lingual dialogs, as set-up by the challenge task of the IWSLT 2009 Evaluation Campaign. Mixtures of n-gram language models are investigated, which are obtained by clustering bilingual training data according to different available human annotations, respectively, at the dialog level, turn level, and dialog act level. For the latter case, clustering of IWSLT data was in fact induced through a comparable Italian-English parallel corpus provided with dialog act annotations. For the sake of adaptation, mixture weight estimation is performed either at the level of single source sentence or test set. Estimated weights are then transferred to the target language mixture model. Experimental results show that, by training different specific language models weighted according to the actual input instead of using a single target language model, significant gains in terms of perplexity and BLEU can be achieved.
2008
A novel alignment model inspired on IBM Model 1
Jesús González-Rubio | Germán Sanchis-Trilles | Alfons Juan | Francisco Casacuberta
Proceedings of the 12th Annual Conference of the European Association for Machine Translation
Jesús González-Rubio | Germán Sanchis-Trilles | Alfons Juan | Francisco Casacuberta
Proceedings of the 12th Annual Conference of the European Association for Machine Translation
Improving Interactive Machine Translation via Mouse Actions
Germán Sanchis-Trilles | Daniel Ortiz-Martínez | Jorge Civera | Francisco Casacuberta | Enrique Vidal | Hieu Hoang
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing
Germán Sanchis-Trilles | Daniel Ortiz-Martínez | Jorge Civera | Francisco Casacuberta | Enrique Vidal | Hieu Hoang
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing
Using Parsed Corpora for Estimating Stochastic Inversion Transduction Grammars
Germán Sanchis | Joan Andreu Sánchez
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Germán Sanchis | Joan Andreu Sánchez
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
An important problem when using Stochastic Inversion Transduction Grammars is their computational cost. More specifically, when dealing with corpora such as Europarl. only one iteration of the estimation algorithm becomes prohibitive. In this work, we apply a reduction of the cost by taking profit of the bracketing information in parsed corpora and show machine translation results obtained with a bracketed Europarl corpus, yielding interresting improvements when increasing the number of non-terminal symbols.
2007
Search
Fix author
Co-authors
- Francisco Casacuberta 19
- Jesús González-Rubio 9
- Daniel Ortiz-Martínez 8
- Vicent Alabau 6
- Michael Carl 6
- Mercedes García-Martínez 5
- Bartolomé Mesa-Lao 5
- Jesús-Andrés Ferrer 4
- Guillem Gascó 4
- Luis A. Leiva 4
- Martha-Alicia Rocha 4
- Joan-Andreu Sánchez 4
- Mauro Cettolo 3
- Philipp Koehn 3
- Pascual Martínez-Gómez 3
- Zuzanna Parcheta 3
- Nicola Bertoldi 2
- Ragnar Bonk 2
- Christian Buck 2
- Mara Chinea-Ríos 2
- Marcello Federico 2
- Ulrich Germann 2
- Jorge González 2
- Robin L. Hill 2
- Herve Saint-Amand 2
- Arianna Bisazza 1
- Siarhei Bratchenia 1
- Jorge Civera 1
- Barbara Dragsted 1
- Jesús González 1
- Kei Hashimoto 1
- Hieu Hoang 1
- Alfons Juan 1
- Yoshihiko Nankaku 1
- Daniel Oriz 1
- Sharon O’Brien 1
- Dan Cheung Petersen 1
- Aliaksei Rudak 1
- Keiichi Tokuda 1
- Chara Tsiukala 1
- Chara Tsoukala 1
- Nancy Underwood 1
- Enrique Vidal 1
- Francisco Zamora-Martínez 1