Conference of the European Association for Machine Translation (2011)


up

bib (full) Proceedings of the 15th Annual Conference of the European Association for Machine Translation

We present a novel strategy to derive new translation units using an additional bilingual corpus and a previously trained SMT system. The units were used to adapt the SMT system. The derivation process can be applied when the additional corpus is very small compared with the original train corpus and it does not require to compute new word alignments using all corpora. The strategy is based in the Levenshtein Distance and its resulting path. We reported a statistically significant improvement, with a confidence level of 99%, when adapting an Ngram-based Catalan-Spanish system using an additional corpus that represents less than 0.5% of the original train corpus. The additional translation units were able to solve morphological and lexical errors and added previously unknown words to the vocabulary.