Abstract
We present a very simple method for parallel text cleaning of low-resource languages, based on projection of word embeddings trained on large monolingual corpora in high-resource languages. In spite of its simplicity, we approach the strong baseline system in the downstream machine translation evaluation.- Anthology ID:
 - W19-5438
 - Volume:
 - Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)
 - Month:
 - August
 - Year:
 - 2019
 - Address:
 - Florence, Italy
 - Venue:
 - WMT
 - SIG:
 - SIGMT
 - Publisher:
 - Association for Computational Linguistics
 - Note:
 - Pages:
 - 277–281
 - Language:
 - URL:
 - https://aclanthology.org/W19-5438
 - DOI:
 - 10.18653/v1/W19-5438
 - Cite (ACL):
 - Murathan Kurfalı and Robert Östling. 2019. Noisy Parallel Corpus Filtering through Projected Word Embeddings. In Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2), pages 277–281, Florence, Italy. Association for Computational Linguistics.
 - Cite (Informal):
 - Noisy Parallel Corpus Filtering through Projected Word Embeddings (Kurfalı & Östling, WMT 2019)
 - PDF:
 - https://preview.aclanthology.org/ingestion-script-update/W19-5438.pdf