Web-Scale Language-Independent Cataloging of Noisy Product Listings for E-Commerce
Pradipto Das, Yandi Xia, Aaron Levine, Giuseppe Di Fabbrizio, Ankur Datta
Abstract
The cataloging of product listings through taxonomy categorization is a fundamental problem for any e-commerce marketplace, with applications ranging from personalized search recommendations to query understanding. However, manual and rule based approaches to categorization are not scalable. In this paper, we compare several classifiers for categorizing listings in both English and Japanese product catalogs. We show empirically that a combination of words from product titles, navigational breadcrumbs, and list prices, when available, improves results significantly. We outline a novel method using correspondence topic models and a lightweight manual process to reduce noise from mis-labeled data in the training set. We contrast linear models, gradient boosted trees (GBTs) and convolutional neural networks (CNNs), and show that GBTs and CNNs yield the highest gains in error reduction. Finally, we show GBTs applied in a language-agnostic way on a large-scale Japanese e-commerce dataset have improved taxonomy categorization performance over current state-of-the-art based on deep belief network models.- Anthology ID:
- E17-1091
- Volume:
- Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers
- Month:
- April
- Year:
- 2017
- Address:
- Valencia, Spain
- Editors:
- Mirella Lapata, Phil Blunsom, Alexander Koller
- Venue:
- EACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 969–979
- Language:
- URL:
- https://aclanthology.org/E17-1091
- DOI:
- Cite (ACL):
- Pradipto Das, Yandi Xia, Aaron Levine, Giuseppe Di Fabbrizio, and Ankur Datta. 2017. Web-Scale Language-Independent Cataloging of Noisy Product Listings for E-Commerce. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pages 969–979, Valencia, Spain. Association for Computational Linguistics.
- Cite (Informal):
- Web-Scale Language-Independent Cataloging of Noisy Product Listings for E-Commerce (Das et al., EACL 2017)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/E17-1091.pdf