FairDITA: Disentangled Image-Text Alignment for Fair Skin Cancer Diagnosis

Jiwon Park, Seunggyu Lee, Younghoon Lee

Research output: Contribution to journalArticlepeer-review

Abstract

Recent advances in deep learning have significantly improved skin cancer classification, yet concerns regarding algorithmic fairness persist because of performance disparities across skin tone groups. Existing methods often attempt to mitigate bias by suppressing sensitive attributes within images. However, they are fundamentally limited by the entanglement of lesion characteristics and skin tone in visual inputs. To address this challenge, we propose a novel contrastive learning framework that leverages explicitly constructed image-text pairs to disentangle lesion condition features from skin tone attributes. Our architecture consists of a shared text encoder and two specialized image encoders that independently align image features with the corresponding textual descriptions of lesion characteristics and skin tone. Furthermore, we measure the semantic distance between lesion conditions and skin color embeddings in both image- and text-embedding spaces and perform optimal representation alignment by matching the distances in the image space to those in the text space. We validated our method using two benchmark datasets, PAD-UFES-20 and Fitzpatrick17k, which span a wide range of skin tones. The experimental results demonstrate that our approach consistently improves both classification accuracy and fairness across multiple evaluation metrics.

Original languageEnglish
JournalJournal of Imaging Informatics in Medicine
DOIs
StateAccepted/In press - 2025

Keywords

  • Machine learning fairness
  • Multimodal representation learning
  • Skin cancer diagnosis

Fingerprint

Dive into the research topics of 'FairDITA: Disentangled Image-Text Alignment for Fair Skin Cancer Diagnosis'. Together they form a unique fingerprint.

Cite this