Jing Yang

Other people with similar names: Jing Yang, Jing Yang (Campinas)

Unverified author pages with similar names: Jing Yang

2026

The core challenge of Compositional Zero-Shot Learning (CZSL) lies in learning representations of sub-concepts (attributes and objects) from seen compositions and recognizing unseen novel compositions. Most existing CZSL methods primarily focus on prompt optimization on the textual side, while overlooking insufficient visual attribute–object sub-concepts disentanglement under a text-centric paradigm. To this end, we propose DMSD, a Dual-Modal Semantic Disentanglement framework that jointly models visual and textual information to achieve effective sub-concept disentanglement. Specifically, DMSD introduces a Contextual Prompt Space, enabling both visual and textual modalities to be modeled under unified contextual semantic representations, thereby enhancing their alignment at the latent semantic level. Moreover, we design Visual Sub-concept Prototypes that explicitly extract and model visual sub-concept features, improving the independence and discriminability of visual sub-concept representations. Furthermore, to achieve fine-grained alignment between visual and textual sub-concepts, we propose a Class-Centroid Bridging Module that guides class centroids toward the textual semantic space, thereby ensuring cross-modal semantic consistency. Extensive experiments on three benchmark datasets (MIT-States, UT-Zappos, and C-GQA) demonstrate that DMSD achieves state-of-the-art performance in both closed-world and open-world settings. Our code is available at https://anonymous.4open.science/r/DMSD-9CC4.

Co-authors

Ruan Xiao li 1

Venues

Findings1

Fix author