An Arabizi-English social media statistical machine translation system

Jonathan May, Yassine Benjira, Abdessamad Echihabi


Abstract
We present a machine translation engine that can translate romanized Arabic, often known as Arabizi, into English. With such a system we can, for the first time, translate the massive amounts of Arabizi that are generated every day in the social media sphere but until now have been uninterpretable by automated means. We accomplish our task by leveraging a machine translation system trained on non-Arabizi social media data and a weighted finite-state transducer-based Arabizi-to-Arabic conversion module, equipped with an Arabic character-based n-gram language model. The resulting system allows high capacity on-the-fly translation from Arabizi to English. We demonstrate via several experiments that our performance is quite close to the theoretical maximum attained by perfect deromanization of Arabizi input. This constitutes the first presentation of a high capacity end-to-end social media Arabizi-to-English translation system.
Anthology ID:
2014.amta-researchers.25
Volume:
Proceedings of the 11th Conference of the Association for Machine Translation in the Americas: MT Researchers Track
Month:
October 22-26
Year:
2014
Address:
Vancouver, Canada
Venue:
AMTA
SIG:
Publisher:
Association for Machine Translation in the Americas
Note:
Pages:
329–341
Language:
URL:
https://aclanthology.org/2014.amta-researchers.25
DOI:
Bibkey:
Cite (ACL):
Jonathan May, Yassine Benjira, and Abdessamad Echihabi. 2014. An Arabizi-English social media statistical machine translation system. In Proceedings of the 11th Conference of the Association for Machine Translation in the Americas: MT Researchers Track, pages 329–341, Vancouver, Canada. Association for Machine Translation in the Americas.
Cite (Informal):
An Arabizi-English social media statistical machine translation system (May et al., AMTA 2014)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2014.amta-researchers.25.pdf