Abstract
Compositional zero-shot learning aims to recognize both object and attribute categories from images, including novel attribute-object combinations that are not observed during training. A key challenge is correctly identifying unseen compositions without supervision while avoiding reliance on inconsistent associations between class names and visual content. This issue mainly arises from the use of fixed text embeddings that are directly tied to class labels. To overcome these challenges, we propose a novel framework that learns primitive disparities without depending on textual labels. Our method integrates an embedding-based strategy with a generative framework, an approach that has received limited attention in compositional learning. Specifically, primitive classes are identified by comparing visual and textual representations in a shared embedding space. To improve visual feature quality, we introduce a region-specific feature aggregation strategy that effectively captures attribute-related information. In addition, to mitigate data scarcity in zero-shot learning scenarios, we design a generative module that synthesizes unseen features using metric-learning-based triplets and feature disparity modeling with learnable class features. This module enables feature synthesis in a unified visual space, reducing dependence on text-driven knowledge commonly used in existing methods. The synthesized features are then used to jointly refine both visual and textual representations, leading to improved generalization performance. Extensive experiments on four widely used benchmark datasets demonstrate that our method outperforms state-of-the-art approaches. The code will be released upon publication.
| Original language | English |
|---|---|
| Article number | 115278 |
| Journal | Knowledge-Based Systems |
| Volume | 336 |
| DOIs | |
| State | Published - 15 Mar 2026 |
Keywords
- Compositional zero-shot learning
- Feature augmentation
- Representation learning
- Zero-shot learning
Fingerprint
Dive into the research topics of 'Generative compositional zero-Shot learning using learnable primitive disparity'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver