Enric Monte
2012
Improving English to Spanish Out-of-Domain Translations by Morphology Generalization and Generation
Lluís Formiga
|
Adolfo Hernández
|
José B. Mariño
|
Enric Monte
Workshop on Monolingual Machine Translation
This paper presents a detailed study of a method for morphology generalization and generation to address out-of-domain translations in English-to-Spanish phrase-based MT. The paper studies whether the morphological richness of the target language causes poor quality translation when translating out-of-domain. In detail, this approach first translates into Spanish simplified forms and then predicts the final inflected forms through a morphology generation step based on shallow and deep-projected linguistic information available from both the source and target-language sentences. Obtained results highlight the importance of generalization, and therefore generation, for dealing with out-of-domain data.
The TALP-UPC phrase-based translation systems for WMT12: Morphology simplification and domain adaptation
Lluís Formiga
|
Carlos A. Henríquez Q.
|
Adolfo Hernández
|
José B. Mariño
|
Enric Monte
|
José A. R. Fonollosa
Proceedings of the Seventh Workshop on Statistical Machine Translation
2008
Using Reordering in Statistical Machine Translation based on Alignment Block Classification
Marta R. Costa-jussà
|
José A. R. Fonollosa
|
Enric Monte
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Statistical Machine Translation (SMT) is based on alignment models which learn from bilingual corpora the word correspondences between source and target language. These models are assumed to be capable of learning reorderings of sequences of words. However, the difference in word order between two languages is one of the most important sources of errors in SMT. This paper proposes a Recursive Alignment Block Classification algorithm (RABCA) that can take advantage of inductive learning in order to solve reordering problems. This algorithm should be able to cope with swapping examples seen during training; it should infer properties that might permit to reorder pairs of blocks (sequences of words) which did not appear during training; and finally it should be robust with respect to training errors and ambiguities. Experiments are reported on the EuroParl task and RABCA is tested using two state-of-the-art SMT systems: a phrased-based and an Ngram-based. In both cases, RABCA improves results.
Search
Co-authors
- Lluis Formiga 2
- Adolfo Hernández H. 2
- José B. Mariño 2
- José A. R. Fonollosa 2
- Carlos Henríquez 1
- show all...