@inproceedings{hangya-fraser-2018-unsupervised,
    title = "An Unsupervised System for Parallel Corpus Filtering",
    author = "Hangya, Viktor  and
      Fraser, Alexander",
    editor = "Bojar, Ond{\v{r}}ej  and
      Chatterjee, Rajen  and
      Federmann, Christian  and
      Fishel, Mark  and
      Graham, Yvette  and
      Haddow, Barry  and
      Huck, Matthias  and
      Yepes, Antonio Jimeno  and
      Koehn, Philipp  and
      Monz, Christof  and
      Negri, Matteo  and
      N{\'e}v{\'e}ol, Aur{\'e}lie  and
      Neves, Mariana  and
      Post, Matt  and
      Specia, Lucia  and
      Turchi, Marco  and
      Verspoor, Karin",
    booktitle = "Proceedings of the Third Conference on Machine Translation: Shared Task Papers",
    month = oct,
    year = "2018",
    address = "Belgium, Brussels",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/iwcs-25-ingestion/W18-6477/",
    doi = "10.18653/v1/W18-6477",
    pages = "882--887",
    abstract = "In this paper we describe LMU Munich{'}s submission for the \textit{WMT 2018 Parallel Corpus Filtering} shared task which addresses the problem of cleaning noisy parallel corpora. The task of mining and cleaning parallel sentences is important for improving the quality of machine translation systems, especially for low-resource languages. We tackle this problem in a fully unsupervised fashion relying on bilingual word embeddings created without any bilingual signal. After pre-filtering noisy data we rank sentence pairs by calculating bilingual sentence-level similarities and then remove redundant data by employing monolingual similarity as well. Our unsupervised system achieved good performance during the official evaluation of the shared task, scoring only a few BLEU points behind the best systems, while not requiring any parallel training data."
}Markdown (Informal)
[An Unsupervised System for Parallel Corpus Filtering](https://preview.aclanthology.org/iwcs-25-ingestion/W18-6477/) (Hangya & Fraser, WMT 2018)
ACL