TY - JOUR
T1 - FairDITA
T2 - Disentangled Image-Text Alignment for Fair Skin Cancer Diagnosis
AU - Park, Jiwon
AU - Lee, Seunggyu
AU - Lee, Younghoon
N1 - Publisher Copyright:
© The Author(s) under exclusive licence to Society for Imaging Informatics in Medicine 2025.
PY - 2025
Y1 - 2025
N2 - Recent advances in deep learning have significantly improved skin cancer classification, yet concerns regarding algorithmic fairness persist because of performance disparities across skin tone groups. Existing methods often attempt to mitigate bias by suppressing sensitive attributes within images. However, they are fundamentally limited by the entanglement of lesion characteristics and skin tone in visual inputs. To address this challenge, we propose a novel contrastive learning framework that leverages explicitly constructed image-text pairs to disentangle lesion condition features from skin tone attributes. Our architecture consists of a shared text encoder and two specialized image encoders that independently align image features with the corresponding textual descriptions of lesion characteristics and skin tone. Furthermore, we measure the semantic distance between lesion conditions and skin color embeddings in both image- and text-embedding spaces and perform optimal representation alignment by matching the distances in the image space to those in the text space. We validated our method using two benchmark datasets, PAD-UFES-20 and Fitzpatrick17k, which span a wide range of skin tones. The experimental results demonstrate that our approach consistently improves both classification accuracy and fairness across multiple evaluation metrics.
AB - Recent advances in deep learning have significantly improved skin cancer classification, yet concerns regarding algorithmic fairness persist because of performance disparities across skin tone groups. Existing methods often attempt to mitigate bias by suppressing sensitive attributes within images. However, they are fundamentally limited by the entanglement of lesion characteristics and skin tone in visual inputs. To address this challenge, we propose a novel contrastive learning framework that leverages explicitly constructed image-text pairs to disentangle lesion condition features from skin tone attributes. Our architecture consists of a shared text encoder and two specialized image encoders that independently align image features with the corresponding textual descriptions of lesion characteristics and skin tone. Furthermore, we measure the semantic distance between lesion conditions and skin color embeddings in both image- and text-embedding spaces and perform optimal representation alignment by matching the distances in the image space to those in the text space. We validated our method using two benchmark datasets, PAD-UFES-20 and Fitzpatrick17k, which span a wide range of skin tones. The experimental results demonstrate that our approach consistently improves both classification accuracy and fairness across multiple evaluation metrics.
KW - Machine learning fairness
KW - Multimodal representation learning
KW - Skin cancer diagnosis
UR - https://www.scopus.com/pages/publications/105017405784
U2 - 10.1007/s10278-025-01693-2
DO - 10.1007/s10278-025-01693-2
M3 - Article
AN - SCOPUS:105017405784
SN - 2948-2933
JO - Journal of Imaging Informatics in Medicine
JF - Journal of Imaging Informatics in Medicine
ER -