Stephan Walter


2024

pdf
Don’t Just Translate, Summarize Too: Cross-lingual Product Title Generation in E-commerce
Bryan Zhang | Taichi Nakatani | Daniel Vidal Hussey | Stephan Walter | Liling Tan
Proceedings of the Seventh Workshop on e-Commerce and NLP @ LREC-COLING 2024

Making product titles informative and concise is vital to delighting e-commerce customers. Recent advances have successfully applied monolingual product title summarization to shorten lengthy product titles. This paper explores the cross-lingual product title generation task that summarizes and translates the source language product title to a shortened product title in the target language. Our main contributions are as follows, (i) we investigate the optimal product title length within the scope of e-commerce localization, (ii) we introduce a simple yet effective data filtering technique to train a length-aware machine translation system and compare it to a publicly available LLM, (iii) we propose an automatic approach to validate experimental results using an open-source LLM without human input and show that these evaluation results are consistent with human preferences.

2023

pdf
Enhancing Arabic Machine Translation for E-commerce Product Information: Data Quality Challenges and Innovative Selection Approaches
Bryan Zhang | Salah Danial | Stephan Walter
Proceedings of ArabicNLP 2023

Product information in e-commerce is usually localized using machine translation (MT) systems. Arabic language has rich morphology and dialectal variations, so Arabic MT in e-commerce training requires a larger volume of data from diverse data sources; Given the dynamic nature of e-commerce, such data needs to be acquired periodically to update the MT. Consequently, validating the quality of training data periodically within an industrial setting presents a notable challenge. Meanwhile, the performance of MT systems is significantly impacted by the quality and appropriateness of the training data. Hence, this study first examines the Arabic MT in e-commerce and investigates the data quality challenges for English-Arabic MT in e-commerce then proposes heuristics-based and topic-based data selection approaches to improve MT for product information. Both online and offline experiment results have shown our proposed approaches are effective, leading to improved shopping experiences for customers.

pdf
Leveraging Latent Topic Information to Improve Product Machine Translation
Bryan Zhang | Stephan Walter | Amita Misra | Liling Tan
Proceedings of Machine Translation Summit XIX, Vol. 2: Users Track

Meeting the expectations of e-commerce customers involves offering a seamless online shopping experience in their preferred language. To achieve this, modern e-commerce platforms rely on machine translation systems to provide multilingual product information on a large scale. However, maintaining high-quality machine translation that can keep up with the ever-expanding volume of product data remains an open challenge for industrial machine translation systems. In this context, topical clustering emerges as a valuable approach, leveraging latent signals and interpretable textual patterns to potentially enhance translation quality and facilitate industry-scale translation data discovery. This paper proposes two innovative methods: topic-based data selection and topic-signal augmentation, both utilizing latent topic clusters to improve the quality of machine translation in e-commerce. Furthermore, we present a data discovery workflow that utilizes topic clusters to effectively manage the growing multilingual product catalogs, addressing the challenges posed by their expansion.

pdf
Brand Consistency for Multilingual E-commerce Machine Translation
Bryan Zhang | Stephan Walter | Saurabh Chetan Birari | Ozlem Eren
Proceedings of Machine Translation Summit XIX, Vol. 2: Users Track

In the realm of e-commerce, it is crucial to ensure consistent localization of brand terms in product information translations. With the ever-evolving e-commerce landscape, new brands and their localized versions are consistently emerging. However, these diverse brand forms and aliases present a significant challenge in machine translation (MT). This study investigates MT brand consistency problem in multilingual e-commerce and proposes practical and sustainable solutions to maintain brand consistency in various scenarios within the e-commerce industry. Through experimentation and analysis of an English-Arabic MT system, we demonstrate the effectiveness of our proposed solutions.

2008

pdf
Linguistic Description and Automatic Extraction of Definitions from German Court Decisions
Stephan Walter
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper discusses the use of computational linguistic technology to extract definitions from a large corpus of German court decisions. We present a corpus-based survey of definition structures used in this kind of document. We then evaluate the results of a definition extraction system that uses patterns identified in this survey to extract from dependency parsed text. We show how an automatically induced ranking function improves the quality of the search results of this system, and we discuss methods for the acquisition of further extraction rules.

2006

pdf
Automatic Extraction of Definitions from German Court Decisions
Stephan Walter | Manfred Pinkal
Proceedings of the Workshop on Information Extraction Beyond The Document