Chang-Uk Shin
2017
Building a Better Bitext for Structurally Different Languages through Self-training
Jungyeul Park
|
Loïc Dugast
|
Jeen-Pyo Hong
|
Chang-Uk Shin
|
Jeong-Won Cha
Proceedings of the First Workshop on Curation and Applications of Parallel and Comparable Corpora
We propose a novel method to bootstrap the construction of parallel corpora for new pairs of structurally different languages. We do so by combining the use of a pivot language and self-training. A pivot language enables the use of existing translation models to bootstrap the alignment and a self-training procedure enables to achieve better alignment, both at the document and sentence level. We also propose several evaluation methods for the resulting alignment.