Abstract
Meeting the expectations of e-commerce customers involves offering a seamless online shopping experience in their preferred language. To achieve this, modern e-commerce platforms rely on machine translation systems to provide multilingual product information on a large scale. However, maintaining high-quality machine translation that can keep up with the ever-expanding volume of product data remains an open challenge for industrial machine translation systems. In this context, topical clustering emerges as a valuable approach, leveraging latent signals and interpretable textual patterns to potentially enhance translation quality and facilitate industry-scale translation data discovery. This paper proposes two innovative methods: topic-based data selection and topic-signal augmentation, both utilizing latent topic clusters to improve the quality of machine translation in e-commerce. Furthermore, we present a data discovery workflow that utilizes topic clusters to effectively manage the growing multilingual product catalogs, addressing the challenges posed by their expansion.- Anthology ID:
- 2023.mtsummit-users.10
- Volume:
- Proceedings of Machine Translation Summit XIX, Vol. 2: Users Track
- Month:
- September
- Year:
- 2023
- Address:
- Macau SAR, China
- Editors:
- Masaru Yamada, Felix do Carmo
- Venue:
- MTSummit
- SIG:
- Publisher:
- Asia-Pacific Association for Machine Translation
- Note:
- Pages:
- 109–118
- Language:
- URL:
- https://aclanthology.org/2023.mtsummit-users.10
- DOI:
- Cite (ACL):
- Bryan Zhang, Stephan Walter, Amita Misra, and Liling Tan. 2023. Leveraging Latent Topic Information to Improve Product Machine Translation. In Proceedings of Machine Translation Summit XIX, Vol. 2: Users Track, pages 109–118, Macau SAR, China. Asia-Pacific Association for Machine Translation.
- Cite (Informal):
- Leveraging Latent Topic Information to Improve Product Machine Translation (Zhang et al., MTSummit 2023)
- PDF:
- https://preview.aclanthology.org/naacl-24-ws-corrections/2023.mtsummit-users.10.pdf