Metadata Enriched Multi-Instance Contrastive Learning for High-Quality Facial Skin Visual Representations

Jihyo Kim, Sungchul Kim, Seungwon Seo, Bumsoo Kim, Daejeong Mun, Hoonjae Lee, Sangheum Hwang

Research output: Contribution to journalArticlepeer-review

Abstract

Utilizing self-supervised learning to learn meaningful representations from unlabeled data can be a cost-effective strategy, particularly in medical domains where expert labeling incurs high costs. Contrastive learning typically employs a single contrastive relationship based on individual instances. However, depending on the task-related characteristics, such as facial skin images, this approach may be unsuitable for learning useful representations. In this work, we propose an advanced contrastive learning method to learn high-quality facial skin representations that are useful for various downstream applications related to skin disorders, such as wrinkles and pigmentation. Our method leverages metadata to establish effective multi-instance contrastive relationships specifically for facial skin images. To this end, we employ mini-batches, constructed through the integration of multiple contrastive relationships, to enable a model to learn the multifaceted features of facial skin. Using a facial skin image dataset, we demonstrate that the proposed method is effective in classifying facial wrinkles and pigmentation severity compared to conventional contrastive learning. The features learned by the proposed method adapt well to other skin lesion datasets from different sources, demonstrating the transferability of the learned skin representations. Our study highlights the potential of application-specific batch configurations leveraging metadata to enhance the effectiveness of self-supervised learning.

Original languageEnglish
Article number2462389
JournalApplied Artificial Intelligence
Volume39
Issue number1
DOIs
StatePublished - 2025

Fingerprint

Dive into the research topics of 'Metadata Enriched Multi-Instance Contrastive Learning for High-Quality Facial Skin Visual Representations'. Together they form a unique fingerprint.

Cite this