Yunde Jia


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2024

pdf bib
In-Context Compositional Generalization for Large Vision-Language Models
Chuanhao Li | Chenchen Jing | Zhen Li | Mingliang Zhai | Yuwei Wu | Yunde Jia
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Recent work has revealed that in-context learning for large language models exhibits compositional generalization capacity, which can be enhanced by selecting in-context demonstrations similar to test cases to provide contextual information. However, how to exhibit in-context compositional generalization (ICCG) of large vision-language models (LVLMs) is non-trival. Due to the inherent asymmetry between visual and linguistic modalities, ICCG in LVLMs faces an inevitable challenge—redundant information on the visual modality. The redundant information affects in-context learning from two aspects: (1) Similarity calculation may be dominated by redundant information, resulting in sub-optimal demonstration selection. (2) Redundant information in in-context demonstrations brings misleading contextual information to in-context learning. To alleviate these problems, we propose a demonstration selection method to achieve ICCG for LVLMs, by considering two key factors of demonstrations: content and structure, from a multimodal perspective. Specifically, we design a diversity-coverage-based matching score to select demonstrations with maximum coverage, and avoid selecting demonstrations with redundant information via their content redundancy and structural complexity. We build a GQA-ICCG dataset to simulate the ICCG setting, and conduct experiments on GQA-ICCG and the VQA v2 dataset. Experimental results demonstrate the effectiveness of our method.