@inproceedings{wang-zhang-2022-addressing,
    title = "Addressing Asymmetry in Multilingual Neural Machine Translation with Fuzzy Task Clustering",
    author = "Wang, Qian  and
      Zhang, Jiajun",
    editor = "Calzolari, Nicoletta  and
      Huang, Chu-Ren  and
      Kim, Hansaem  and
      Pustejovsky, James  and
      Wanner, Leo  and
      Choi, Key-Sun  and
      Ryu, Pum-Mo  and
      Chen, Hsin-Hsi  and
      Donatelli, Lucia  and
      Ji, Heng  and
      Kurohashi, Sadao  and
      Paggio, Patrizia  and
      Xue, Nianwen  and
      Kim, Seokhwan  and
      Hahm, Younggyun  and
      He, Zhong  and
      Lee, Tony Kyungil  and
      Santus, Enrico  and
      Bond, Francis  and
      Na, Seung-Hoon",
    booktitle = "Proceedings of the 29th International Conference on Computational Linguistics",
    month = oct,
    year = "2022",
    address = "Gyeongju, Republic of Korea",
    publisher = "International Committee on Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2022.coling-1.455/",
    pages = "5129--5141",
    abstract = "Multilingual neural machine translation (NMT) enables positive knowledge transfer among multiple translation tasks with a shared underlying model, but a unified multilingual model usually suffers from capacity bottleneck when tens or hundreds of languages are involved. A possible solution is to cluster languages and train individual model for each cluster. However, the existing clustering methods based on language similarity cannot handle the asymmetric problem in multilingual NMT, i.e., one translation task A can benefit from another translation task B but task B will be harmed by task A. To address this problem, we propose a fuzzy task clustering method for multilingual NMT. Specifically, we employ task affinity, defined as the loss change of one translation task caused by the training of another, as the clustering criterion. Next, we cluster the translation tasks based on the task affinity, such that tasks from the same cluster can benefit each other. For each cluster, we further find out a set of auxiliary translation tasks that benefit the tasks in this cluster. In this way, the model for each cluster is trained not only on the tasks in the cluster but also on the auxiliary tasks. We conduct extensive experiments for one-to-many, manyto-one, and many-to-many translation scenarios to verify the effectiveness of our method."
}