HomoGraphAdapter: A Homogeneous Graph Neural Network as an Effective Adapter for Vision-Language Models

Chuan He, Zhuozhao Li, Song Guo, Xiaocheng Lu, Jinxiang Lai


Abstract
Vision-Language Models (VLMs), such as CLIP, have exhibited significant advancements in recognizing visual concepts through natural language guidance. However, adapting these models to downstream tasks remains challenging. Existing adaptation methods either overlook the structural knowledge between the text and image modalities or create overly complex graphs containing redundant information for alignment, leading to suboptimal classification performance and increased computational overhead. This paper proposes a novel adapter-tuning methodology named Homogeneous Graph Adapter (HomoGraphAdapter), which transforms diverse textual and visual descriptions into a unified set of node representations and establishes edges between nodes for inter-modal and cross-modal semantic alignment. We leverage a straightforward homogeneous Graph Neural Network (GNN) to adapt positive and negative classifiers across text and image modalities. The classifiers comprehensively enhance the performance for few-shot classification and OOD generalization. Compared with the SOTA approach HeGraphAdapter, HomoGraphAdapter improves classification accuracy by an average of 1.51% for 1-shot and 0.74% for 16-shot on 11 datasets, while also reducing both precomputation time and training time.
Anthology ID:
2025.findings-emnlp.1270
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
23400–23414
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1270/
DOI:
10.18653/v1/2025.findings-emnlp.1270
Bibkey:
Cite (ACL):
Chuan He, Zhuozhao Li, Song Guo, Xiaocheng Lu, and Jinxiang Lai. 2025. HomoGraphAdapter: A Homogeneous Graph Neural Network as an Effective Adapter for Vision-Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 23400–23414, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
HomoGraphAdapter: A Homogeneous Graph Neural Network as an Effective Adapter for Vision-Language Models (He et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1270.pdf
Checklist:
 2025.findings-emnlp.1270.checklist.pdf