Proceedings of the 15th Annual Conference of the European Association for Machine Translation

Mikel L. Forcada, Heidi Depraetere, Vincent Vandeghinste (Editors)


Anthology ID:
2011.eamt-1
Month:
May 30–31
Year:
2011
Address:
Leuven, Belgium
Venue:
EAMT
SIG:
Publisher:
European Association for Machine Translation
URL:
https://preview.aclanthology.org/bootstrap-5/2011.eamt-1/
DOI:
Bib Export formats:
BibTeX

We present a novel strategy to derive new translation units using an additional bilingual corpus and a previously trained SMT system. The units were used to adapt the SMT system. The derivation process can be applied when the additional corpus is very small compared with the original train corpus and it does not require to compute new word alignments using all corpora. The strategy is based in the Levenshtein Distance and its resulting path. We reported a statistically significant improvement, with a confidence level of 99%, when adapting an Ngram-based Catalan-Spanish system using an additional corpus that represents less than 0.5% of the original train corpus. The additional translation units were able to solve morphological and lexical errors and added previously unknown words to the vocabulary.