Abstract
Product matching is the task of matching a seller-listed item to an appropriate product. It is a critical task for an e-commerce platform, and the approach needs to be efficient to run in a large-scale setting. A dual encoder approach has been a common practice for product matching recently, due to its high performance and computation efficiency. In this paper, we propose a two-stage training for the dual encoder model. Stage 1 trained a dual encoder to identify the more informative training data. Stage 2 then train on the more informative data to get a better dual encoder model. This technique is a learned approach for building training data. We evaluate the retrieval-enhanced training on two different datasets: a publicly available Large-Scale Product Matching dataset and a real-world e-commerce dataset containing 47 million products. Experiment results show that our approach improved by 2% F1 on the public dataset and 9% F1 on the real-world e-commerce dataset.- Anthology ID:
- 2023.emnlp-industry.22
- Volume:
- Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Mingxuan Wang, Imed Zitouni
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 216–222
- Language:
- URL:
- https://aclanthology.org/2023.emnlp-industry.22
- DOI:
- 10.18653/v1/2023.emnlp-industry.22
- Cite (ACL):
- Justin Chiu. 2023. Retrieval-Enhanced Dual Encoder Training for Product Matching. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 216–222, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Retrieval-Enhanced Dual Encoder Training for Product Matching (Chiu, EMNLP 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2023.emnlp-industry.22.pdf