Binu Jasim


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2020

pdf bib
PhraseOut: A Code Mixed Data Augmentation Method for MultilingualNeural Machine Tranlsation
Binu Jasim | Vinay Namboodiri | C V Jawahar
Proceedings of the 17th International Conference on Natural Language Processing (ICON)

Data Augmentation methods for Neural Machine Translation (NMT) such as back- translation (BT) and self-training (ST) are quite popular. In a multilingual NMT system, simply copying monolingual source sentences to the target (Copying) is an effective data augmentation method. Back-translation aug- ments parallel data by translating monolingual sentences in the target side to source language. In this work we propose to use a partial back- translation method in a multilingual setting. Instead of translating the entire monolingual target sentence back into the source language, we replace selected high confidence phrases only and keep the rest of the words in the target language itself. (We call this method PhraseOut). Our experiments on low resource multilingual translation models show that PhraseOut gives reasonable improvements over the existing data augmentation methods.