Francisco Casacuberta

Also published as: F. Casacuberta

2021

pdf bib abs
Introducing Mouse Actions into Interactive-Predictive Neural Machine Translation
Ángel Navarro | Francisco Casacuberta
Proceedings of Machine Translation Summit XVIII: Research Track

The quality of the translations generated by Machine Translation (MT) systems has highly improved through the years and but we are still far away to obtain fully automatic high-quality translations. To generate them and translators make use of Computer-Assisted Translation (CAT) tools and among which we find the Interactive-Predictive Machine Translation (IPMT) systems. In this paper and we use bandit feedback as the main and only information needed to generate new predictions that correct the previous translations. The application of bandit feedback reduces significantly the number of words that the translator need to type in an IPMT session. In conclusion and the use of this technique saves useful time and effort to translators and its performance improves with the future advances in MT and so we recommend its application in the actuals IPMT systems.

2020

In the translation industry, human experts usually supervise and post-edit machine translation hypotheses. Adaptive neural machine translation systems, able to incrementally update the underlying models under an online learning regime, have been proven to be useful to improve the efficiency of this workflow. However, this incremental adaptation is somewhat unstable, and it may lead to undesirable side effects. One of them is the sporadic appearance of made-up words, as a byproduct of an erroneous application of subword segmentation techniques. In this work, we extend previous studies on on-the-fly adaptation of neural machine translation systems. We perform a user study involving professional, experienced post-editors, delving deeper on the aforementioned problems. Results show that adaptive systems were able to learn how to generate the correct translation for task-specific terms, resulting in an improvement of the user’s productivity. We also observed a close similitude, in terms of morphology, between made-up words and the words that were expected.

pdf bib abs
NICE: Neural Integrated Custom Engines
Daniel Marín Buj | Daniel Ibáñez García | Zuzanna Parcheta | Francisco Casacuberta
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

In this paper, we present a machine translation system implemented by the Translation Centre for the Bodies of the European Union (CdT). The main goal of this project is to create domain-specific machine translation engines in order to support machine translation services and applications to the Translation Centre’s clients. In this article, we explain the entire implementation process of NICE: Neural Integrated Custom Engines. We describe the problems identified and the solutions provided, and present the final results for different language pairs. Finally, we describe the work that will be done on this project in the future.

2019

pdf bib abs
Demonstration of a Neural Machine Translation System with Online Learning for Translators
Miguel Domingo | Mercedes García-Martínez | Amando Estela Pastor | Laurent Bié | Alexander Helle | Álvaro Peris | Francisco Casacuberta | Manuel Herranz Pérez
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

We present a demonstration of our system, which implements online learning for neural machine translation in a production environment. These techniques allow the system to continuously learn from the corrections provided by the translators. We implemented an end-to-end platform integrating our machine translation servers to one of the most common user interfaces for professional translators: SDL Trados Studio. We pretend to save post-editing effort as the machine is continuously learning from its mistakes and adapting the models to a specific domain or user style.

pdf bib abs
A Neural, Interactive-predictive System for Multimodal Sequence to Sequence Tasks
Álvaro Peris | Francisco Casacuberta
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

We present a demonstration of a neural interactive-predictive system for tackling multimodal sequence to sequence tasks. The system generates text predictions to different sequence to sequence tasks: machine translation, image and video captioning. These predictions are revised by a human agent, who introduces corrections in the form of characters. The system reacts to each correction, providing alternative hypotheses, compelling with the feedback provided by the user. The final objective is to reduce the human effort required during this correction process. This system is implemented following a client-server architecture. For accessing the system, we developed a website, which communicates with the neural model, hosted in a local server. From this website, the different tasks can be tackled following the interactive–predictive framework. We open-source all the code developed for building this system. The demonstration in hosted in http://casmacat.prhlt.upv.es/interactive-seq2seq.

pdf bib abs
Filtering of Noisy Parallel Corpora Based on Hypothesis Generation
Zuzanna Parcheta | Germán Sanchis-Trilles | Francisco Casacuberta
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)

The filtering task of noisy parallel corpora in WMT2019 aims to challenge participants to create filtering methods to be useful for training machine translation systems. In this work, we introduce a noisy parallel corpora filtering system based on generating hypotheses by means of a translation model. We train translation models in both language pairs: Nepali–English and Sinhala–English using provided parallel corpora. We select the training subset for three language pairs (Nepali, Sinhala and Hindi to English) jointly using bilingual cross-entropy selection to create the best possible translation model for both language pairs. Once the translation models are trained, we translate the noisy corpora and generate a hypothesis for each sentence pair. We compute the smoothed BLEU score between the target sentence and generated hypothesis. In addition, we apply several rules to discard very noisy or inadequate sentences which can lower the translation score. These heuristics are based on sentence length, source and target similarity and source language detection. We compare our results with the baseline published on the shared task website, which uses the Zipporah model, over which we achieve significant improvements in one of the conditions in the shared task. The designed filtering system is domain independent and all experiments are conducted using neural machine translation.

2018

pdf bib abs
Active Learning for Interactive Neural Machine Translation of Data Streams
Álvaro Peris | Francisco Casacuberta
Proceedings of the 22nd Conference on Computational Natural Language Learning

We study the application of active learning techniques to the translation of unbounded data streams via interactive neural machine translation. The main idea is to select, from an unbounded stream of source sentences, those worth to be supervised by a human agent. The user will interactively translate those samples. Once validated, these data is useful for adapting the neural machine translation model. We propose two novel methods for selecting the samples to be validated. We exploit the information from the attention mechanism of a neural machine translation system. Our experiments show that the inclusion of active learning techniques into this pipeline allows to reduce the effort required during the process, while increasing the quality of the translation system. Moreover, it enables to balance the human effort required for achieving a certain translation quality. Moreover, our neural system outperforms classical approaches by a large margin.

pdf bib abs
A Machine Translation Approach for Modernizing Historical Documents Using Backtranslation
Miguel Domingo | Francisco Casacuberta
Proceedings of the 15th International Conference on Spoken Language Translation

Human language evolves with the passage of time. This makes historical documents to be hard to comprehend by contemporary people and, thus, limits their accessibility to scholars specialized in the time period in which a certain document was written. Modernization aims at breaking this language barrier and increase the accessibility of historical documents to a broader audience. To do so, it generates a new version of a historical document, written in the modern version of the document’s original language. In this work, we propose several machine translation approaches for modernizing historical documents. We tested these approaches in different scenarios, obtaining very encouraging results.

2017

pdf bib
Adapting Neural Machine Translation with Parallel Synthetic Data
Mara Chinea-Ríos | Álvaro Peris | Francisco Casacuberta
Proceedings of the Second Conference on Machine Translation

2016

pdf bib
Interactive-Predictive Translation Based on Multiple Word-Segments
Miguel Domingo | Alvaro Peris | Francisco Casacuberta
Proceedings of the 19th Annual Conference of the European Association for Machine Translation

pdf bib
Beyond Prefix-Based Interactive Translation Prediction
Jesús González-Rubio | Daniel Ortiz-Martínez | Francisco Casacuberta | José Miguel Benedi Ruiz
Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning

2014

This paper describes a pilot study with a computed-assisted translation workbench aiming at testing the integration of online and active learning features. We investigate the effect of these features on translation productivity, using interactive translation prediction (ITP) as a baseline. User activity data were collected from five beta testers using key-logging and eye-tracking. User feedback was also collected at the end of the experiments in the form of retrospective think-aloud protocols. We found that OL performs better than ITP, especially in terms of translation speed. In addition, AL provides better translation quality than ITP for the same levels of user effort. We plan to incorporate these features in the final version of the workbench.

This paper describes the field trial and subsequent evaluation of a post-editing workbench which is currently under development in the EU-funded CasMaCat project. Based on user evaluations of the initial prototype of the workbench, this second prototype of the workbench includes a number of interactive features designed to improve productivity and user satisfaction. Using CasMaCat’s own facilities for logging keystrokes and eye tracking, data were collected from nine post-editors in a professional setting. These data were then used to investigate the effects of the interactive features on productivity, quality, user satisfaction and cognitive load as reflected in the post-editors gaze activity. These quantitative results are combined with the qualitative results derived from user questionnaires and interviews conducted with all the participants.

pdf bib abs
Online optimisation of log-linear weights in interactive machine translation
Mara Chinea Rios | Germán Sanchis-Trilles | Daniel Ortiz-Martínez | Francisco Casacuberta
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Whenever the quality provided by a machine translation system is not enough, a human expert is required to correct the sentences provided by the machine translation system. In such a setup, it is crucial that the system is able to learn from the errors that have already been corrected. In this paper, we analyse the applicability of discriminative ridge regression for learning the log-linear weights of a state-of-the-art machine translation system underlying an interactive machine translation framework, with encouraging results.

bib
Efficient wordgraph for interactive translation prediction
Germán Sanchis-Trilles | Daniel Ortiz-Martínez | Francisco Casacuberta
Proceedings of the 17th Annual conference of the European Association for Machine Translation

pdf bib
CASMACAT: cognitive analysis and statistical methods for advanced computer aided translation
Philipp Koehn | Michael Carl | Francisco Casacuberta | Eva Marcos
Proceedings of the 17th Annual conference of the European Association for Machine Translation

pdf bib
The New Thot Toolkit for Fully-Automatic and Interactive Statistical Machine Translation
Daniel Ortiz-Martínez | Francisco Casacuberta
Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Inference of Phrase-Based Translation Models via Minimum Description Length
Jesús González-Rubio | Francisco Casacuberta
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers

2013

pdf bib
Interactive Machine Translation using Hierarchical Translation Models
Jesús González-Rubio | Daniel Ortiz-Martínez | José-Miguel Benedí | Francisco Casacuberta
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib abs
Improving the minimum Bayes’ risk combination of machine translation systems
Jesús González-Rubio | Francisco Casacuberta
Proceedings of the 10th International Workshop on Spoken Language Translation: Papers

We investigate the problem of combining the outputs of different translation systems into a minimum Bayes’ risk consensus translation. We explore different risk formulations based on the BLEU score, and provide a dynamic programming decoding algorithm for each of them. In our experiments, these algorithms generated consensus translations with better risk, and more efficiently, than previous proposals.

pdf bib abs
Emprical study of a two-step approach to estimate translation quality
Jesús González-Rubio | J. Ramón Navarro-Cerdán | Francisco Casacuberta
Proceedings of the 10th International Workshop on Spoken Language Translation: Papers

We present a method to estimate the quality of automatic translations when reference translations are not available. Quality estimation is addressed as a two-step regression problem where multiple features are combined to predict a quality score. Given a set of features, we aim at automatically extracting the variables that better explain translation quality, and use them to predict the quality score. The soundness of our approach is assessed by the encouraging results obtained in an exhaustive experimentation with several feature sets. Moreover, the studied approach is highly-scalable allowing us to employ hundreds of features to predict translation quality.

pdf bib
CASMACAT: Cognitive Analysis and Statistical Methods for Advanced Computer Aided Translation
Philipp Koehn | Michael Carl | Francisco Casacuberta | Eva Marcos
Proceedings of Machine Translation Summit XIV: European projects

Currently, a great effort is being carried out in the digitalisation of large historical document collections for preservation purposes. The documents in these collections are usually written in ancient languages, such as Latin or Greek, which limits the access of the general public to their content due to the language barrier. Therefore, digital libraries aim not only at storing raw images of digitalised documents, but also to annotate them with their corresponding text transcriptions and translations into modern languages. Unfortunately, ancient languages have at their disposal scarce electronic resources to be exploited by natural language processing techniques. This paper describes the compilation process of a novel Latin-Catalan parallel corpus as a new task for statistical machine translation (SMT). Preliminary experimental results are also reported using a state-of-the-art phrase-based SMT system. The results presented in this work reveal the complexity of the task and its challenging, but interesting nature for future development.

This paper presents the submissions of the PRHLT group for the evaluation campaign of the International Workshop on Spoken Language Translation. We focus on the development of reliable translation systems between syntactically different languages (DIALOG task) and on the efficient training of SMT models in resource-rich scenarios (TALK task).

pdf bib
Log-linear weight optimisation via Bayesian Adaptation in Statistical Machine Translation
Germán Sanchis-Trilles | Francisco Casacuberta
Coling 2010: Posters

pdf bib
Online Learning for Interactive Statistical Machine Translation
Daniel Ortiz-Martínez | Ismael García-Varea | Francisco Casacuberta
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Balancing User Effort and Translation Error in Interactive Machine Translation via Confidence Measures
Jesús González-Rubio | Daniel Ortiz-Martínez | Francisco Casacuberta
Proceedings of the ACL 2010 Conference Short Papers

pdf bib
Potential scope of a fully-integrated architecture for speech translation
Alicia Pérez | María Inés Torres | Francisco Casacuberta
Proceedings of the 14th Annual conference of the European Association for Machine Translation

pdf bib
On the Use of Confidence Measures within an Interactive-predictive Machine Translation System
Jesús González-Rubio | Daniel Ortíz-Martínez | Francisco Casacuberta
Proceedings of the 14th Annual conference of the European Association for Machine Translation

2009

pdf bib
Statistical Post-Editing of a Rule-Based Machine Translation System
Antonio-L. Lagarda | Vicent Alabau | Francisco Casacuberta | Roberto Silva | Enrique Díaz-de-Liaño
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers

pdf bib
GREAT: A Finite-State Machine Translation Toolkit Implementing a Grammatical Inference Approach for Transducer Inference (GIATI)
Jorge González | Francisco Casacuberta
Proceedings of the EACL 2009 Workshop on Computational Linguistic Aspects of Grammatical Inference

pdf bib
Interactive Machine Translation Based on Partial Statistical Phrase-based Alignments
Daniel Ortiz-Martínez | Ismael García-Varea | Francisco Casacuberta
Proceedings of the International Conference RANLP-2009

2008

pdf bib
Improving Interactive Machine Translation via Mouse Actions
Germán Sanchis-Trilles | Daniel Ortiz-Martínez | Jorge Civera | Francisco Casacuberta | Enrique Vidal | Hieu Hoang
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

pdf bib
A finite-state framework for log-linear models in machine translation
Jorge González | Francisco Casacuberta
Proceedings of the 12th Annual conference of the European Association for Machine Translation

pdf bib
A novel alignment model inspired on IBM Model 1
Jesús González-Rubio | Germán Sanchis-Trilles | Alfons Juan | Francisco Casacuberta
Proceedings of the 12th Annual conference of the European Association for Machine Translation

pdf bib
Applying boosting to statistical machine translation
Antonio L. Lagarda | Francisco Casacuberta
Proceedings of the 12th Annual conference of the European Association for Machine Translation

pdf bib
Phrase-level alignment generation using a smoothed loglinear phrase-based statistical alignment model
Daniel Ortiz-Martínez | Ismael García-Varea | Francisco Casacuberta
Proceedings of the 12th Annual conference of the European Association for Machine Translation

2007

pdf bib
An Integrated Architecture for Speech-Input Multi-Target Machine Translation
Alicia Pérez | M. Teresa González | M. Inés Torres | Francisco Casacuberta
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers

pdf bib abs
A comparison of linguistically and statistically enhanced models for speech-to-speech machine translation
Alicia Pérez | Víctor Guijarrubia | Raquel Justo | M. Inés Torres | Francisco Casacuberta
Proceedings of the Fourth International Workshop on Spoken Language Translation

The goal of this work is to improve current translation models by taking into account additional knowledge sources such as semantically motivated segmentation or statistical categorization. Specifically, two different approaches are discussed. On the one hand, phrase-based approach, and on the other hand, categorization. For both approaches, both statistical and linguistic alternatives are explored. As for translation framework, finite-state transducers are considered. These are versatile models that can be easily integrated on-the-fly with acoustic models for speech translation purposes. In what the experimental framework concerns, all the models presented were evaluated and compared taking confidence intervals into account.

pdf bib abs
Using word posterior probabilities in lattice translation
Vicente Alabau | Alberto Sanchis | Francisco Casacuberta
Proceedings of the Fourth International Workshop on Spoken Language Translation

In this paper we describe the statistical machine translation system developed at ITI/UPV, which aims especially at speech recognition and statistical machine translation integration, for the evaluation campaign of the International Workshop on Spoken Language Translation (2007). The system we have developed takes advantage of an improved word lattice representation that uses word posterior probabilities. These word posterior probabilities are then added as a feature to a log-linear model. This model includes a stochastic finite-state transducer which allows an easy lattice integration. Furthermore, it provides a statistical phrase-based reordering model that is able to perform local reorderings of the output. We have tested this model on the Italian-English corpus, for clean text, 1-best ASR and lattice ASR inputs. The results and conclusions of such experiments are reported at the end of this paper.

pdf bib
Improving speech-to-speech translation using word posterior probabilities
Vicente Alabau | Alberto Sanchis | Francisco Casacuberta
Proceedings of Machine Translation Summit XI: Papers

pdf bib
Combining translation models in statistical machine translation
Jesús Andrés-Ferrer | Ismael Garcia-Varea | Francisco Casacuberta
Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages: Papers

pdf bib
Reordering via n-best lists for Spanish-Basque translation
Germán Sanchis | Francisco Casacuberta
Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages: Papers

pdf bib
Speech-Input Multi-Target Machine Translation
Alicia Pérez | M. Teresa González | M. Inés Torres | Francisco Casacuberta
Proceedings of the Second Workshop on Statistical Machine Translation

2006

pdf bib
Generalized Stack Decoding Algorithms for Statistical Machine Translation
Daniel Ortiz Martínez | Ismael García Varea | Francisco Casacuberta
Proceedings on the Workshop on Statistical Machine Translation

pdf bib
Statistical Phrase-Based Models for Interactive Computer-Assisted Translation
Jesús Tomás | Francisco Casacuberta
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

2005

pdf bib abs
Thot: a Toolkit To Train Phrase-based Statistical Translation Models
Daniel Ortiz-Martínez | Ismael García-Varea | Francisco Casacuberta
Proceedings of Machine Translation Summit X: Papers

In this paper, we present the Thot toolkit, a set of tools to train phrase-based models for statistical machine translation, which is publicly available as open source software. The toolkit obtains phrase-based models from word-based alignment models; to our knowledge, this functionality has not been offered by any publicly available toolkit. The Thot toolkit also implements a new way for estimating phrase models, this allows to obtain more complete phrase models than the methods described in the literature, including a segmentation length submodel. The toolkit output can be given in different formats in order to be used by other statistical machine translation tools like Pharaoh, which is a beam search decoder for phrase-based alignment models which was used in order to perform translation experiments with the generated models. Additionally, the Thot toolkit can be used to obtain the best alignment between a sentence pair at phrase level.

2004

pdf bib
Machine Translation with Inferred Stochastic Finite-State Transducers
Francisco Casacuberta | Enrique Vidal
Computational Linguistics, Volume 30, Number 2, June 2004

pdf bib abs
Translation Memories Enrichment by Statistical Bilingual Segmentation
Francisco Nevado | Francisco Casacuberta | Josu Landa
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

A majority of Machine Aided Translation systems are based on comparisons between a source sentence and reference sentences stored in Translation Memories (TMs). The translation search is done by looking for sentences in a database which are similar to the source sentence. TMs have two basic limitations: the dependency on the repetition of complete sentences and the high cost of building a TM. As human translators do not only remember sentences from their preceding translations, but they also decompose the sentence to be translated and work with smaller units, it would be desirable to enrich the TM database with smaller translation units. This enrichment should also be automatic in order not to increase the cost of building a TM. We propose the application of two automatic bilingual segmentation techniques based on statistical translation methods in order to create new, shorter bilingual segments to be included in a TM database. An evaluation of the two techniques is carried out for a bilingual Basque-Spanish task.

2003

pdf bib abs
On the use of statistical machine-translation techniques within a memory-based translation system (AMETRA)
Daniel Ortíz | Ismael García-Varea | Francisco Casacuberta | Antonio Lagarda | Jorge González
Proceedings of Machine Translation Summit IX: Papers

The goal of the AMETRA project is to make a computer-assisted translation tool from the Spanish language to the Basque language under the memory-based translation framework. The system is based on a large collection of bilingual word-segments. These segments are obtained using linguistic or statistical techniques from a Spanish-Basque bilingual corpus consisting of sentences extracted from the Basque Country’s of£cial government record. One of the tasks within the global information document of the AMETRA project is to study the combination of well-known statistical techniques for the translation of short sequences and techniques for memory-based translation. In this paper, we address the problem of constructing a statistical module to deal with the task of translating segments. The task undertaken in the AMETRA project is compared with other existing translation tasks, This study includes the results of some preliminary experiments we have carried out using well-known statistical machine translation tools and techniques.

pdf bib
Adapting finite-state translation to the TransType2 project
Elsa Cubel | Jorge González | Antonio Lagarda | Francisco Casacuberta | Alfons Juan | Enrique Vidal
EAMT Workshop: Improving MT through other language technology tools: resources and tools for building MT

pdf bib
Parallel Corpora Segmentation Using Anchor Words
Francisco Nevado | Francisco Casacuberta | Enrique Vidal
Proceedings of the 7th International EAMT workshop on MT and other language technology tools, Improving MT through other language technology tools, Resource and tools for building MT at EACL 2003

pdf bib
A Quantitative Method for Machine Translation Evaluation
Jesús Tomás | Josep Àngel Mas | Francisco Casacuberta
Proceedings of the EACL 2003 Workshop on Evaluation Initiatives in Natural Language Processing: are evaluation methods, metrics and resources reusable?

2002

pdf bib
Improving Alignment Quality in Statistical Machine Translation Using Context-dependent Maximum Entropy Models
Ismael García Varea | Franz J. Och | Hermann Ney | Francisco Casacuberta
COLING 2002: The 19th International Conference on Computational Linguistics

pdf bib
Architectures for Speech-to-Speech Translation Using Finite-state Models
Francisco Casacuberta | Enrique Vidal | Juan Miguel Vilar
Proceedings of the ACL-02 Workshop on Speech-to-Speech Translation: Algorithms and Systems

pdf bib abs
Efficient integration of maximum entropy lexicon models within the training of statistical alignment models
Ismael García-Varea | Franz J. Och | Hermann Ney | Francisco Casacuberta
Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: Technical Papers

Maximum entropy (ME) models have been successfully applied to many natural language problems. In this paper, we show how to integrate ME models efficiently within a maximum likelihood training scheme of statistical machine translation models. Specifically, we define a set of context-dependent ME lexicon models and we present how to perform an efficient training of these ME models within the conventional expectation-maximization (EM) training of statistical translation models. Experimental results are also given in order to demonstrate how these ME models improve the results obtained with the traditional translation models. The results are presented by means of alignment quality comparing the resulting alignments with manually annotated reference alignments.

2001

pdf bib
Refined Lexicon Models for Statistical Machine Translation using a Maximum Entropy Approach
Ismael García-Varea | Franz J. Och | Hermann Ney | Francisco Casacuberta
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics

pdf bib abs
Search algorithms for statistical machine translation based on dynamic programming and pruning techniques
Ismael García-Varea | Francisco Casacuberta
Proceedings of Machine Translation Summit VIII

The increasing interest in the statistical approach to Machine Translation is due to the development of effective algorithms for training the probabilistic models proposed so far. However, one of the open problems with statistical machine translation is the design of efficient algorithms for translating a given input string. For some interesting models, only (good) approximate solutions can be found. Recently, a dynamic programming-like algorithm for the IBM-Model 2 has been proposed which is based on an iterative process of refinement solutions. A new dynamic programming-like algorithm is proposed here to deal with more complex IBM models (models 3 to 5). The computational cost of the algorithm is reduced by using an alignment-based pruning technique. Experimental results with the so-called “Tourist Task” are also presented.

A finite-state, rule-based morphological analyser is presented here, within the framework of machine translation system TAVAL. This morphological analyser introduces specific features which are particularly useful for translation, such as the detection and morphological tagging of word groups that act as a single lexical unit for translation purposes. The case where words in one such group are not strictly contiguous is also covered. A brief description of the Spanish-to-Catalan and Catalan-to-Spanish translation system TAVAL is given in the paper.

pdf bib abs
Monotone statistical translation using word groups
Jesús Tomás | Francisco Casacuberta
Proceedings of Machine Translation Summit VIII

A new system for statistical natural language translation for languages with similar grammar is introduced. Specifically, it can be used with Romanic Languages, such as French, Spanish or Catalan. The statistical translation uses two sources of information: a language model and a translation model. The language model used is a standard trigram model. A new approach is defined in the translation model. The two main properties of the translation model are: the translation probabilities are computed between groups of words and the alignment between those groups is monotone. That is, the order between the word groups in the source sentence is conserved in the target sentence. Once, the translation model has been defined, we present an algorithm to infer its parameters from training samples. The translation process is carried out with an efficient algorithm based on stack-decoding. Finally, we present some translation results from Catalan to Spanish and compare our model with other conventional models.