Chang-Uk Shin


2017

pdf bib
Building a Better Bitext for Structurally Different Languages through Self-training
Jungyeul Park | Loïc Dugast | Jeen-Pyo Hong | Chang-Uk Shin | Jeong-Won Cha
Proceedings of the First Workshop on Curation and Applications of Parallel and Comparable Corpora

We propose a novel method to bootstrap the construction of parallel corpora for new pairs of structurally different languages. We do so by combining the use of a pivot language and self-training. A pivot language enables the use of existing translation models to bootstrap the alignment and a self-training procedure enables to achieve better alignment, both at the document and sentence level. We also propose several evaluation methods for the resulting alignment.