Findings of the WMT 2019 Shared Task on Parallel Corpus Filtering for Low-Resource Conditions

Philipp Koehn; Francisco Guzmán; Vishrav Chaudhary; Juan Pino

doi:10.18653/v1/W19-5404

Findings of the WMT 2019 Shared Task on Parallel Corpus Filtering for Low-Resource Conditions

Philipp Koehn, Francisco Guzmán, Vishrav Chaudhary, Juan Pino

Abstract

Following the WMT 2018 Shared Task on Parallel Corpus Filtering, we posed the challenge of assigning sentence-level quality scores for very noisy corpora of sentence pairs crawled from the web, with the goal of sub-selecting 2% and 10% of the highest-quality data to be used to train machine translation systems. This year, the task tackled the low resource condition of Nepali-English and Sinhala-English. Eleven participants from companies, national research labs, and universities participated in this task.

Anthology ID:: W19-5404
Volume:: Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)
Month:: August
Year:: 2019
Address:: Florence, Italy
Venue:: WMT
SIG:: SIGMT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 54–72
Language:
URL:: https://aclanthology.org/W19-5404
DOI:: 10.18653/v1/W19-5404
Bibkey:
Cite (ACL):: Philipp Koehn, Francisco Guzmán, Vishrav Chaudhary, and Juan Pino. 2019. Findings of the WMT 2019 Shared Task on Parallel Corpus Filtering for Low-Resource Conditions. In Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2), pages 54–72, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):: Findings of the WMT 2019 Shared Task on Parallel Corpus Filtering for Low-Resource Conditions (Koehn et al., WMT 2019)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-script-update/W19-5404.pdf

PDF Search