Shaoyuan Xu
2023
KG-FLIP: Knowledge-guided Fashion-domain Language-Image Pre-training for E-commerce
Qinjin Jia
|
Yang Liu
|
Daoping Wu
|
Shaoyuan Xu
|
Huidong Liu
|
Jinmiao Fu
|
Roland Vollgraf
|
Bryan Wang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track)
Various Vision-Language Pre-training (VLP) models (e.g., CLIP, BLIP) have sprung up and dramatically advanced the benchmarks for public general-domain datasets (e.g., COCO, Flickr30k). Such models usually learn the cross-modal alignment from large-scale well-aligned image-text datasets without leveraging external knowledge. Adapting these models to downstream applications in specific domains like fashion requires fine-grained in-domain image-text corpus, which are usually less semantically aligned and in small scale that requires efficient pre-training strategies. In this paper, we propose a knowledge-guided fashion-domain language-image pre-training (FLIP) framework that focuses on learning fine-grained representations in e-commerce domain and utilizes external knowledge (i.e., product attribute schema), to improve the pre-training efficiency. Experiments demonstrate that FLIP outperforms previous state-of-the-art VLP models on Amazon data and on the Fashion-Gen dataset by large margins. FLIP has been successfully deployed in the Amazon catalog system to backfill missing attributes and improve the customer shopping experience.
Search
Co-authors
- Qinjin Jia 1
- Yang Liu 1
- Daoping Wu 1
- Huidong Liu 1
- Jinmiao Fu 1
- show all...
Venues
- acl1