Enhancing Arabic Machine Translation for E-commerce Product Information: Data Quality Challenges and Innovative Selection Approaches

Bryan Zhang, Salah Danial, Stephan Walter


Abstract
Product information in e-commerce is usually localized using machine translation (MT) systems. Arabic language has rich morphology and dialectal variations, so Arabic MT in e-commerce training requires a larger volume of data from diverse data sources; Given the dynamic nature of e-commerce, such data needs to be acquired periodically to update the MT. Consequently, validating the quality of training data periodically within an industrial setting presents a notable challenge. Meanwhile, the performance of MT systems is significantly impacted by the quality and appropriateness of the training data. Hence, this study first examines the Arabic MT in e-commerce and investigates the data quality challenges for English-Arabic MT in e-commerce then proposes heuristics-based and topic-based data selection approaches to improve MT for product information. Both online and offline experiment results have shown our proposed approaches are effective, leading to improved shopping experiences for customers.
Anthology ID:
2023.arabicnlp-1.13
Volume:
Proceedings of ArabicNLP 2023
Month:
December
Year:
2023
Address:
Singapore (Hybrid)
Editors:
Hassan Sawaf, Samhaa El-Beltagy, Wajdi Zaghouani, Walid Magdy, Ahmed Abdelali, Nadi Tomeh, Ibrahim Abu Farha, Nizar Habash, Salam Khalifa, Amr Keleg, Hatem Haddad, Imed Zitouni, Khalil Mrini, Rawan Almatham
Venues:
ArabicNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
150–157
Language:
URL:
https://aclanthology.org/2023.arabicnlp-1.13
DOI:
10.18653/v1/2023.arabicnlp-1.13
Bibkey:
Cite (ACL):
Bryan Zhang, Salah Danial, and Stephan Walter. 2023. Enhancing Arabic Machine Translation for E-commerce Product Information: Data Quality Challenges and Innovative Selection Approaches. In Proceedings of ArabicNLP 2023, pages 150–157, Singapore (Hybrid). Association for Computational Linguistics.
Cite (Informal):
Enhancing Arabic Machine Translation for E-commerce Product Information: Data Quality Challenges and Innovative Selection Approaches (Zhang et al., ArabicNLP-WS 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-5/2023.arabicnlp-1.13.pdf