GSID: Generative Semantic Indexing for E-Commerce Product Understanding

Haiyang Yang, Qinye Xie, Qingheng Zhang, Chen Li Yu, Huike Zou, Chengbao Lian, Shuguang Han, Fei Huang, Jufeng Chen, Bo Zheng


Abstract
Structured representation of product information is a major bottleneck for the efficiency of e-commerce platforms, especially in second-hand ecommerce platforms. Currently, most product information are organized based on manually curated product categories and attributes, which often fail to adequately cover long-tail products and do not align well with buyer preference. To address these problems, we propose Generative Semantic InDexings (GSID), a data-driven approach to generate product structured representations. GSID consists of two key components: (1) Pre-training on unstructured product metadata to learn in-domain semantic embeddings, and (2) Generating more effective semantic codes tailored for downstream product-centric applications. Extensive experiments are conducted to validate the effectiveness of GSID, and it has been successfully deployed on the real-world e-commerce platform, achieving promising results on product understanding and other downstream tasks.
Anthology ID:
2025.emnlp-industry.78
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:
November
Year:
2025
Address:
Suzhou (China)
Editors:
Saloni Potdar, Lina Rojas-Barahona, Sebastien Montella
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1113–1121
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-industry.78/
DOI:
Bibkey:
Cite (ACL):
Haiyang Yang, Qinye Xie, Qingheng Zhang, Chen Li Yu, Huike Zou, Chengbao Lian, Shuguang Han, Fei Huang, Jufeng Chen, and Bo Zheng. 2025. GSID: Generative Semantic Indexing for E-Commerce Product Understanding. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 1113–1121, Suzhou (China). Association for Computational Linguistics.
Cite (Informal):
GSID: Generative Semantic Indexing for E-Commerce Product Understanding (Yang et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-industry.78.pdf