Rethinking Task Sampling for Few-shot Vision-Language Transfer Learning

Zhenhailong Wang; Hang Yu; Manling Li; Han Zhao; Heng Ji

Rethinking Task Sampling for Few-shot Vision-Language Transfer Learning

Zhenhailong Wang, Hang Yu, Manling Li, Han Zhao, Heng Ji

Abstract

Despite achieving state-of-the-art zero-shot performance, existing vision-language models still fall short of few-shot transfer ability on domain-specific problems. Classical fine-tuning often fails to prevent highly expressive models from exploiting spurious correlations. Although model-agnostic meta-learning (MAML) presents as a natural alternative for few-shot transfer learning, the expensive computation due to implicit second-order optimization limits its use on large-scale vision-language models such as CLIP. While much literature has been devoted to exploring alternative optimization strategies, we identify another essential aspect towards effective few-shot transfer learning, task sampling, which is previously only be viewed as part of data pre-processing in MAML. To show the impact of task sampling, we propose a simple algorithm, Model-Agnostic Multitask Fine-tuning (MAMF), which differentiates classical fine-tuning only on uniformly sampling multiple tasks. Despite its simplicity, we show that MAMF consistently outperforms classical fine-tuning on five few-shot image classification tasks. We further show that the effectiveness of the bi-level optimization in MAML is highly sensitive to the zero-shot performance of a task in the context of few-shot vision-language classification. The goal of this paper is to provide new insights on what makes few-shot learning work, and encourage more research into investigating better task sampling strategies.

Anthology ID:: 2022.mmmpie-1.2
Volume:: Proceedings of the First Workshop on Performance and Interpretability Evaluations of Multimodal, Multipurpose, Massive-Scale Models
Month:: October
Year:: 2022
Address:: Virtual
Venue:: MMMPIE
SIG:
Publisher:: International Conference on Computational Linguistics
Note:
Pages:: 7–14
Language:
URL:: https://aclanthology.org/2022.mmmpie-1.2
DOI:
Bibkey:
Cite (ACL):: Zhenhailong Wang, Hang Yu, Manling Li, Han Zhao, and Heng Ji. 2022. Rethinking Task Sampling for Few-shot Vision-Language Transfer Learning. In Proceedings of the First Workshop on Performance and Interpretability Evaluations of Multimodal, Multipurpose, Massive-Scale Models, pages 7–14, Virtual. International Conference on Computational Linguistics.
Cite (Informal):: Rethinking Task Sampling for Few-shot Vision-Language Transfer Learning (Wang et al., MMMPIE 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-2024-clasp/2022.mmmpie-1.2.pdf
Code: mikewangwzhl/multitask-finetuning_clip
Data: CLEVR

PDF Cite Search Code