Shengdong Zhang


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2022

pdf bib
Exploring Compositional Image Retrieval with Hybrid Compositional Learning and Heuristic Negative Mining
Chao Wang | Ehsan Nezhadarya | Tanmana Sadhu | Shengdong Zhang
Findings of the Association for Computational Linguistics: EMNLP 2022

Compositional image retrieval (CIR) is a challenging retrieval task, where the query is composed of a reference image and a modification text, and the target is another image reflecting the modification to the reference image. Due to the great success of the pre-trained vision-and-language model CLIP and its favorable applicability to large-scale retrieval tasks, we propose a CIR model HyCoLe-HNM with CLIP as the backbone. In HyCoLe-HNM, we follow the contrastive pre-training method of CLIP to perform cross-modal representation learning. On this basis, we propose a hybrid compositional learning mechanism, which includes both image compositional learning and text compositional learning. In hybrid compositional learning, we borrow a gated fusion mechanism from a question answering model to perform compositional fusion, and propose a heuristic negative mining method to filter negative samples. Privileged information in the form of image-related texts is utilized in cross-modal representation learning and hybrid compositional learning. Experimental results show that HyCoLe-HNM achieves state-of-the-art performance on three CIR datasets, namely FashionIQ, Fashion200K, and MIT-States.