Jinxiang Lai

2025

pdf bib abs
HomoGraphAdapter: A Homogeneous Graph Neural Network as an Effective Adapter for Vision-Language Models
Chuan He | Zhuozhao Li | Song Guo | Xiaocheng Lu | Jinxiang Lai
Findings of the Association for Computational Linguistics: EMNLP 2025

Vision-Language Models (VLMs), such as CLIP, have exhibited significant advancements in recognizing visual concepts through natural language guidance. However, adapting these models to downstream tasks remains challenging. Existing adaptation methods either overlook the structural knowledge between the text and image modalities or create overly complex graphs containing redundant information for alignment, leading to suboptimal classification performance and increased computational overhead. This paper proposes a novel adapter-tuning methodology named Homogeneous Graph Adapter (HomoGraphAdapter), which transforms diverse textual and visual descriptions into a unified set of node representations and establishes edges between nodes for inter-modal and cross-modal semantic alignment. We leverage a straightforward homogeneous Graph Neural Network (GNN) to adapt positive and negative classifiers across text and image modalities. The classifiers comprehensively enhance the performance for few-shot classification and OOD generalization. Compared with the SOTA approach HeGraphAdapter, HomoGraphAdapter improves classification accuracy by an average of 1.51% for 1-shot and 0.74% for 16-shot on 11 datasets, while also reducing both precomputation time and training time.

Co-authors

Venues

findings1

Fix data

Jinxiang Lai

Fixing paper assignments

2025

Co-authors

Venues