Building a Better Bitext for Structurally Different Languages through Self-training
Jungyeul Park, Loïc Dugast, Jeen-Pyo Hong, Chang-Uk Shin, Jeong-Won Cha
Abstract
We propose a novel method to bootstrap the construction of parallel corpora for new pairs of structurally different languages. We do so by combining the use of a pivot language and self-training. A pivot language enables the use of existing translation models to bootstrap the alignment and a self-training procedure enables to achieve better alignment, both at the document and sentence level. We also propose several evaluation methods for the resulting alignment.- Anthology ID:
- W17-5601
- Volume:
- Proceedings of the First Workshop on Curation and Applications of Parallel and Comparable Corpora
- Month:
- November
- Year:
- 2017
- Address:
- Taipei, Taiwan
- Editors:
- Haithem Afli, Chao-Hong Liu
- Venue:
- WS
- SIG:
- Publisher:
- Asian Federation of Natural Language Processing
- Note:
- Pages:
- 1–10
- Language:
- URL:
- https://aclanthology.org/W17-5601
- DOI:
- Cite (ACL):
- Jungyeul Park, Loïc Dugast, Jeen-Pyo Hong, Chang-Uk Shin, and Jeong-Won Cha. 2017. Building a Better Bitext for Structurally Different Languages through Self-training. In Proceedings of the First Workshop on Curation and Applications of Parallel and Comparable Corpora, pages 1–10, Taipei, Taiwan. Asian Federation of Natural Language Processing.
- Cite (Informal):
- Building a Better Bitext for Structurally Different Languages through Self-training (Park et al., 2017)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/W17-5601.pdf