KG-FLIP: Knowledge-guided Fashion-domain Language-Image Pre-training for E-commerce

Qinjin Jia; Yang Liu; Daoping Wu; Shaoyuan Xu; Huidong Liu; Jinmiao Fu; Roland Vollgraf; Bryan Wang

doi:10.18653/v1/2023.acl-industry.9

KG-FLIP: Knowledge-guided Fashion-domain Language-Image Pre-training for E-commerce

Qinjin Jia, Yang Liu, Daoping Wu, Shaoyuan Xu, Huidong Liu, Jinmiao Fu, Roland Vollgraf, Bryan Wang

Abstract

Various Vision-Language Pre-training (VLP) models (e.g., CLIP, BLIP) have sprung up and dramatically advanced the benchmarks for public general-domain datasets (e.g., COCO, Flickr30k). Such models usually learn the cross-modal alignment from large-scale well-aligned image-text datasets without leveraging external knowledge. Adapting these models to downstream applications in specific domains like fashion requires fine-grained in-domain image-text corpus, which are usually less semantically aligned and in small scale that requires efficient pre-training strategies. In this paper, we propose a knowledge-guided fashion-domain language-image pre-training (FLIP) framework that focuses on learning fine-grained representations in e-commerce domain and utilizes external knowledge (i.e., product attribute schema), to improve the pre-training efficiency. Experiments demonstrate that FLIP outperforms previous state-of-the-art VLP models on Amazon data and on the Fashion-Gen dataset by large margins. FLIP has been successfully deployed in the Amazon catalog system to backfill missing attributes and improve the customer shopping experience.

Anthology ID:: 2023.acl-industry.9
Volume:: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Sunayana Sitaram, Beata Beigman Klebanov, Jason D Williams
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 81–88
Language:
URL:: https://aclanthology.org/2023.acl-industry.9
DOI:: 10.18653/v1/2023.acl-industry.9
Bibkey:
Cite (ACL):: Qinjin Jia, Yang Liu, Daoping Wu, Shaoyuan Xu, Huidong Liu, Jinmiao Fu, Roland Vollgraf, and Bryan Wang. 2023. KG-FLIP: Knowledge-guided Fashion-domain Language-Image Pre-training for E-commerce. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track), pages 81–88, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: KG-FLIP: Knowledge-guided Fashion-domain Language-Image Pre-training for E-commerce (Jia et al., ACL 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-2023-videos/2023.acl-industry.9.pdf

PDF Search