Multi-Value-Product Retrieval-Augmented Generation for Industrial Product Attribute Value Identification

Huike Zou, Haiyang Yang, Yindu Su, Chen Li Yu, Qinye Xie, Chengbao Lian, Qingheng Zhang, Shuguang Han, Fei Huang, Jufeng Chen


Abstract
Identifying attribute values from product profiles is a key task for improving product search, recommendation, and business analytics on e-commerce platforms, which we called Product Attribute Value Identification (PAVI) . However, existing PAVI methods face critical challenges, such as cascading errors, inability to handle out-of-distribution (OOD) attribute values, and lack of generalization capability. To address these limitations, we introduce Multi-Value-Product Retrieval-Augmented Generation (MVP-RAG), combining the strengths of retrieval, generation, and classification paradigms. MVP-RAG defines PAVI as a retrieval-generation task, where the product title description serves as the query, and products and attribute values act as the corpus. It first retrieves similar products of the same category and candidate attribute values, and then generates the standardized attribute values. The key advantages of this work are: (1) the proposal of a multi-level retrieval scheme, with products and attribute values as distinct hierarchical levels in PAVI domain (2) attribute value generation of large language model to significantly alleviate the OOD problem and (3) its successful deployment in a real-world industrial environment. Extensive experimental results on the dataset demonstrate that the proposed method performs better than the state-of-the-art baselines.
Anthology ID:
2025.emnlp-industry.147
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:
November
Year:
2025
Address:
Suzhou (China)
Editors:
Saloni Potdar, Lina Rojas-Barahona, Sebastien Montella
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2096–2105
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-industry.147/
DOI:
Bibkey:
Cite (ACL):
Huike Zou, Haiyang Yang, Yindu Su, Chen Li Yu, Qinye Xie, Chengbao Lian, Qingheng Zhang, Shuguang Han, Fei Huang, and Jufeng Chen. 2025. Multi-Value-Product Retrieval-Augmented Generation for Industrial Product Attribute Value Identification. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 2096–2105, Suzhou (China). Association for Computational Linguistics.
Cite (Informal):
Multi-Value-Product Retrieval-Augmented Generation for Industrial Product Attribute Value Identification (Zou et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-industry.147.pdf