Leveraging Latent Topic Information to Improve Product Machine Translation

Bryan Zhang, Stephan Walter, Amita Misra, Liling Tan


Abstract
Meeting the expectations of e-commerce customers involves offering a seamless online shopping experience in their preferred language. To achieve this, modern e-commerce platforms rely on machine translation systems to provide multilingual product information on a large scale. However, maintaining high-quality machine translation that can keep up with the ever-expanding volume of product data remains an open challenge for industrial machine translation systems. In this context, topical clustering emerges as a valuable approach, leveraging latent signals and interpretable textual patterns to potentially enhance translation quality and facilitate industry-scale translation data discovery. This paper proposes two innovative methods: topic-based data selection and topic-signal augmentation, both utilizing latent topic clusters to improve the quality of machine translation in e-commerce. Furthermore, we present a data discovery workflow that utilizes topic clusters to effectively manage the growing multilingual product catalogs, addressing the challenges posed by their expansion.
Anthology ID:
2023.mtsummit-users.10
Volume:
Proceedings of Machine Translation Summit XIX, Vol. 2: Users Track
Month:
September
Year:
2023
Address:
Macau SAR, China
Editors:
Masaru Yamada, Felix do Carmo
Venue:
MTSummit
SIG:
Publisher:
Asia-Pacific Association for Machine Translation
Note:
Pages:
109–118
Language:
URL:
https://aclanthology.org/2023.mtsummit-users.10
DOI:
Bibkey:
Cite (ACL):
Bryan Zhang, Stephan Walter, Amita Misra, and Liling Tan. 2023. Leveraging Latent Topic Information to Improve Product Machine Translation. In Proceedings of Machine Translation Summit XIX, Vol. 2: Users Track, pages 109–118, Macau SAR, China. Asia-Pacific Association for Machine Translation.
Cite (Informal):
Leveraging Latent Topic Information to Improve Product Machine Translation (Zhang et al., MTSummit 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl-24-ws-corrections/2023.mtsummit-users.10.pdf