Shruti Bhosale


2021

pdf bib
Facebook AI’s WMT21 News Translation Task Submission
Chau Tran | Shruti Bhosale | James Cross | Philipp Koehn | Sergey Edunov | Angela Fan
Proceedings of the Sixth Conference on Machine Translation

We describe Facebook’s multilingual model submission to the WMT2021 shared task on news translation. We participate in 14 language directions: English to and from Czech, German, Hausa, Icelandic, Japanese, Russian, and Chinese. To develop systems covering all these directions, we focus on multilingual models. We utilize data from all available sources — WMT, large-scale data mining, and in-domain backtranslation — to create high quality bilingual and multilingual baselines. Subsequently, we investigate strategies for scaling multilingual model size, such that one system has sufficient capacity for high quality representations of all eight languages. Our final submission is an ensemble of dense and sparse Mixture-of-Expert multilingual translation models, followed by finetuning on in-domain news data and noisy channel reranking. Compared to previous year’s winning submissions, our multilingual system improved the translation quality on all language directions, with an average improvement of 2.0 BLEU. In the WMT2021 task, our system ranks first in 10 directions based on automatic evaluation.

2020

pdf bib
Language Models not just for Pre-training: Fast Online Neural Noisy Channel Modeling
Shruti Bhosale | Kyra Yee | Sergey Edunov | Michael Auli
Proceedings of the Fifth Conference on Machine Translation

Pre-training models on vast quantities of unlabeled data has emerged as an effective approach to improving accuracy on many NLP tasks. On the other hand, traditional machine translation has a long history of leveraging unlabeled data through noisy channel modeling. The same idea has recently been shown to achieve strong improvements for neural machine translation. Unfortunately, na ̈ıve noisy channel modeling with modern sequence to sequence models is up to an order of magnitude slower than alternatives. We address this issue by introducing efficient approximations to make inference with the noisy channel approach as fast as strong ensembles while increasing accuracy. We also show that the noisy channel approach can outperform strong pre-training results by achieving a new state of the art on WMT Romanian-English translation.

2013

pdf bib
Detecting Promotional Content in Wikipedia
Shruti Bhosale | Heath Vinicombe | Raymond Mooney
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing