A Data-Centric Approach to Real-World Custom NMT for Arabic

Rebecca Jonsson, Ruba Jaikat, Abdallah Nasir, Nour Al-Khdour, Sara Alisis


Abstract
In this presentation, we will present our approach to taking Custom NMT to the next level by building tailor-made NMT to fit the needs of businesses seeking to scale in the Arabic-speaking world. In close collaboration with customers in the MENA region and with a deep understanding of their data, we work on building a variety of NMT models that accommodate to the unique challenges of the Arabic language. This session will provide insights into the challenges of acquiring, analyzing, and processing customer data in various sectors, as well as insights into how to best make use of this data to build high-quality Custom NMT models in English-Arabic. Feedback from usage of these models in production will be provided. Furthermore, we will show how to use our translation management system to make the most of the custom NMT, by leveraging the models, fine-tuning and continuing to improve them over time.
Anthology ID:
2021.mtsummit-up.24
Volume:
Proceedings of Machine Translation Summit XVIII: Users and Providers Track
Month:
August
Year:
2021
Address:
Virtual
Editors:
Janice Campbell, Ben Huyck, Stephen Larocca, Jay Marciano, Konstantin Savenkov, Alex Yanishevsky
Venue:
MTSummit
SIG:
Publisher:
Association for Machine Translation in the Americas
Note:
Pages:
335–352
Language:
URL:
https://aclanthology.org/2021.mtsummit-up.24
DOI:
Bibkey:
Cite (ACL):
Rebecca Jonsson, Ruba Jaikat, Abdallah Nasir, Nour Al-Khdour, and Sara Alisis. 2021. A Data-Centric Approach to Real-World Custom NMT for Arabic. In Proceedings of Machine Translation Summit XVIII: Users and Providers Track, pages 335–352, Virtual. Association for Machine Translation in the Americas.
Cite (Informal):
A Data-Centric Approach to Real-World Custom NMT for Arabic (Jonsson et al., MTSummit 2021)
Copy Citation:
Presentation:
 2021.mtsummit-up.24.Presentation.pdf