FashionKLIP: Enhancing E-Commerce Image-Text Retrieval with Fashion Multi-Modal Conceptual Knowledge Graph
Xiaodan Wang, Chengyu Wang, Lei Li, Zhixu Li, Ben Chen, Linbo Jin, Jun Huang, Yanghua Xiao, Ming Gao
Abstract
Image-text retrieval is a core task in the multi-modal domain, which arises a lot of attention from both research and industry communities. Recently, the booming of visual-language pre-trained (VLP) models has greatly enhanced the performance of cross-modal retrieval. However, the fine-grained interactions between objects from different modalities are far from well-established. This issue becomes more severe in the e-commerce domain, which lacks sufficient training data and fine-grained cross-modal knowledge. To alleviate the problem, this paper proposes a novel e-commerce knowledge-enhanced VLP model FashionKLIP. We first automatically establish a multi-modal conceptual knowledge graph from large-scale e-commerce image-text data, and then inject the prior knowledge into the VLP model to align across modalities at the conceptual level. The experiments conducted on a public benchmark dataset demonstrate that FashionKLIP effectively enhances the performance of e-commerce image-text retrieval upon state-of-the-art VLP models by a large margin. The application of the method in real industrial scenarios also proves the feasibility and efficiency of FashionKLIP.- Anthology ID:
- 2023.acl-industry.16
- Volume:
- Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track)
- Month:
- July
- Year:
- 2023
- Address:
- Toronto, Canada
- Editors:
- Sunayana Sitaram, Beata Beigman Klebanov, Jason D Williams
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 149–158
- Language:
- URL:
- https://preview.aclanthology.org/icon-24-ingestion/2023.acl-industry.16/
- DOI:
- 10.18653/v1/2023.acl-industry.16
- Cite (ACL):
- Xiaodan Wang, Chengyu Wang, Lei Li, Zhixu Li, Ben Chen, Linbo Jin, Jun Huang, Yanghua Xiao, and Ming Gao. 2023. FashionKLIP: Enhancing E-Commerce Image-Text Retrieval with Fashion Multi-Modal Conceptual Knowledge Graph. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track), pages 149–158, Toronto, Canada. Association for Computational Linguistics.
- Cite (Informal):
- FashionKLIP: Enhancing E-Commerce Image-Text Retrieval with Fashion Multi-Modal Conceptual Knowledge Graph (Wang et al., ACL 2023)
- PDF:
- https://preview.aclanthology.org/icon-24-ingestion/2023.acl-industry.16.pdf