Francisco Nevado


2004

pdf
Translation Memories Enrichment by Statistical Bilingual Segmentation
Francisco Nevado | Francisco Casacuberta | Josu Landa
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

A majority of Machine Aided Translation systems are based on comparisons between a source sentence and reference sentences stored in Translation Memories (TMs). The translation search is done by looking for sentences in a database which are similar to the source sentence. TMs have two basic limitations: the dependency on the repetition of complete sentences and the high cost of building a TM. As human translators do not only remember sentences from their preceding translations, but they also decompose the sentence to be translated and work with smaller units, it would be desirable to enrich the TM database with smaller translation units. This enrichment should also be automatic in order not to increase the cost of building a TM. We propose the application of two automatic bilingual segmentation techniques based on statistical translation methods in order to create new, shorter bilingual segments to be included in a TM database. An evaluation of the two techniques is carried out for a bilingual Basque-Spanish task.

2003

pdf
Parallel Corpora Segmentation Using Anchor Words
Francisco Nevado | Francisco Casacuberta | Enrique Vidal
Proceedings of the 7th International EAMT workshop on MT and other language technology tools, Improving MT through other language technology tools, Resource and tools for building MT at EACL 2003