Translation Memories Enrichment by Statistical Bilingual Segmentation

Francisco Nevado, Francisco Casacuberta, Josu Landa


Abstract
A majority of Machine Aided Translation systems are based on comparisons between a source sentence and reference sentences stored in Translation Memories (TMs). The translation search is done by looking for sentences in a database which are similar to the source sentence. TMs have two basic limitations: the dependency on the repetition of complete sentences and the high cost of building a TM. As human translators do not only remember sentences from their preceding translations, but they also decompose the sentence to be translated and work with smaller units, it would be desirable to enrich the TM database with smaller translation units. This enrichment should also be automatic in order not to increase the cost of building a TM. We propose the application of two automatic bilingual segmentation techniques based on statistical translation methods in order to create new, shorter bilingual segments to be included in a TM database. An evaluation of the two techniques is carried out for a bilingual Basque-Spanish task.
Anthology ID:
L04-1262
Volume:
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
Month:
May
Year:
2004
Address:
Lisbon, Portugal
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2004/pdf/443.pdf
DOI:
Bibkey:
Cite (ACL):
Francisco Nevado, Francisco Casacuberta, and Josu Landa. 2004. Translation Memories Enrichment by Statistical Bilingual Segmentation. In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04), Lisbon, Portugal. European Language Resources Association (ELRA).
Cite (Informal):
Translation Memories Enrichment by Statistical Bilingual Segmentation (Nevado et al., LREC 2004)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2004/pdf/443.pdf