Unsupervised contrastive learning using out-of-distribution data for long-tailed dataset

Cuong Manh Hoang, Yeejin Lee, Byeongkeun Kang

Research output: Contribution to journalArticlepeer-review

Abstract

This work addresses the task of self-supervised learning (SSL) on a long-tailed dataset that aims to learn balanced and well-separated representations for downstream tasks such as image classification. This task is crucial because the real world contains numerous object categories, and their distributions are inherently imbalanced. Towards robust SSL on a class-imbalanced dataset, we investigate leveraging a network trained using unlabeled out-of-distribution (OOD) data that are prevalently available online. We first train a network using both in-domain (ID) and sampled OOD data by back-propagating the proposed pseudo semantic discrimination loss alongside a domain discrimination loss. The OOD data sampling and loss functions are designed to learn a balanced and well-separated embedding space. Subsequently, we further optimize the network on ID data by unsupervised contrastive learning while using the previously trained network as a guiding network. The guiding network is utilized to select positive/negative samples and to control the strengths of attractive/repulsive forces in contrastive learning. We also distil and transfer its embedding space to the training network to maintain balancedness and separability. Through experiments on four publicly available long-tailed datasets, we demonstrate that the proposed method outperforms previous state-of-the-art methods.

Original languageEnglish
Article number130779
JournalNeurocomputing
Volume649
DOIs
StatePublished - 7 Oct 2025

Keywords

  • Convolutional neural networks
  • Imbalanced data
  • Long-tailed data
  • Self-supervised learning
  • Unsupervised contrastive learning

Fingerprint

Dive into the research topics of 'Unsupervised contrastive learning using out-of-distribution data for long-tailed dataset'. Together they form a unique fingerprint.

Cite this