Leveraging Product Catalog Patterns for Multilingual E-commerce Product Attribute Prediction

Bryan Zhang, Suleiman A. Khan, SteCphan Walter


Abstract
E-commerce stores increasingly use Large Language Models (LLMs) to enhance catalog data quality through automated regeneration. A critical challenge is accurately predicting missing structured attribute values across multilingual product catalogs, where LLM performance varies significantly by language. While existing approaches leverage general knowledge through prompt engineering and external retrieval, more effective and accurate signals for attribute prediction can exist within the catalog ecosystem itself-similar products often share consistent patterns and structural relationships, and may have the missing attributes filled. Therefore, this paper introduces PatternRAG, a novel retrieval-augmented system that strategically leverages existing product catalog entries to guide LLM predictions for missing attributes. Our approach introduces a multi-stage retrieval framework that progressively refines the search space based on product type, uses textual similarity, glance views and brand relationships to identify the most relevant attribute-filled examples for LLM prediction guidance. Experiments on test sets across three major e-commerce stores in different languages (US, DE, FR) demonstrate substantial improvements in catalog data quality, achieving up to 34% increase in recall and 0.8% in precision for attribute value prediction. At catalog entry level, it also achieves up to +43.32% increase in completeness and up to +2.83% in correctness.
Anthology ID:
2025.emnlp-industry.18
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:
November
Year:
2025
Address:
Suzhou (China)
Editors:
Saloni Potdar, Lina Rojas-Barahona, Sebastien Montella
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
267–275
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-industry.18/
DOI:
Bibkey:
Cite (ACL):
Bryan Zhang, Suleiman A. Khan, and SteCphan Walter. 2025. Leveraging Product Catalog Patterns for Multilingual E-commerce Product Attribute Prediction. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 267–275, Suzhou (China). Association for Computational Linguistics.
Cite (Informal):
Leveraging Product Catalog Patterns for Multilingual E-commerce Product Attribute Prediction (Zhang et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-industry.18.pdf