Bryan Zhang


Machine translation impact in E-commerce multilingual search
Bryan Zhang | Amita Misra
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track

Previous work suggests that performance of cross-lingual information retrieval correlates highly with the quality of Machine Translation. However, there may be a threshold beyond which improving query translation quality yields little or no benefit to further improve the retrieval performance. This threshold may depend upon multiple factors including the source and target languages, the existing MT system quality and the search pipeline. In order to identify the benefit of improving an MT system for a given search pipeline, we investigate the sensitivity of retrieval quality to the presence of different levels of MT quality using experimental datasets collected from actual traffic. We systematically improve the performance of our MT systems quality on language pairs as measured by MT evaluation metrics including Bleu and Chrf to determine their impact on search precision metrics and extract signals that help to guide the improvement strategies. Using this information we develop techniques to compare query translations for multiple language pairs and identify the most promising language pairs to invest and improve.

Improve MT for Search with Selected Translation Memory using Search Signals
Bryan Zhang
Proceedings of the 15th Biennial Conference of the Association for Machine Translation in the Americas (Volume 2: Users and Providers Track and Government Track)

Multilingual search is indispensable for a seamless e-commerce experience. E-commerce search engines typically support multilingual search by cascading a machine translation step before searching the index in its primary language. In practice, search query translation usually involves a translation memory matching step before machine translation. A translation memory (TM) can (i) effectively enforce terminologies for specific brands or products (ii) reduce the computation footprint and latency for synchronous translation and, (iii) fix machine translation issues that cannot be resolved easily or quickly without retraining/tuning the machine translation engine in production. In this abstract, we will propose (1) a method of improving MT query translation using such TM entries when the TM entries are only sub-strings of a customer search query, and (2) an approach to selecting TM entries using search signals that can contribute to better search results.