Jiankang Deng


Fixing paper assignments

  1. Please select all papers that do not belong to this person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2024

pdf bib
RWKV-CLIP: A Robust Vision-Language Representation Learner
Tiancheng Gu | Kaicheng Yang | Xiang An | Ziyong Feng | Dongnan Liu | Weidong Cai | Jiankang Deng
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Contrastive Language-Image Pre-training (CLIP) has significantly improved performance in various vision-language tasks by expanding the dataset with image-text pairs obtained from the web. This paper further explores CLIP from the perspectives of data and model architecture. To mitigate the impact of the noise data and enhance the quality of large-scale image-text data crawled from the internet, we introduce a diverse description generation framework that can leverage Large Language Models (LLMs) to combine and refine information from web-based image-text pairs, synthetic captions, and detection tags. Additionally, we propose RWKV-CLIP, the first RWKV-driven vision-language representation learning model that combines the effective parallel training of transformers with the efficient inference of RNNs. Extensive experiments across different model scales and pre-training datasets demonstrate that RWKV-CLIP is a robust vision-language representation learner and it achieves state-of-the-art performance across multiple downstream tasks, including linear probing, zero-shot classification, and zero-shot image-text retrieval. To facilitate future research, the code and pre-trained models are released at https://github.com/deepglint/RWKV-CLIP.