Abstract
We present a very simple method for parallel text cleaning of low-resource languages, based on projection of word embeddings trained on large monolingual corpora in high-resource languages. In spite of its simplicity, we approach the strong baseline system in the downstream machine translation evaluation.- Anthology ID:
- W19-5438
- Volume:
- Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)
- Month:
- August
- Year:
- 2019
- Address:
- Florence, Italy
- Venue:
- WMT
- SIG:
- SIGMT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 277–281
- Language:
- URL:
- https://aclanthology.org/W19-5438
- DOI:
- 10.18653/v1/W19-5438
- Cite (ACL):
- Murathan Kurfalı and Robert Östling. 2019. Noisy Parallel Corpus Filtering through Projected Word Embeddings. In Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2), pages 277–281, Florence, Italy. Association for Computational Linguistics.
- Cite (Informal):
- Noisy Parallel Corpus Filtering through Projected Word Embeddings (Kurfalı & Östling, WMT 2019)
- PDF:
- https://preview.aclanthology.org/starsem-semeval-split/W19-5438.pdf