TY - JOUR
T1 - A Cognitive Framework for Learning Debiased and Interpretable Representations via Debiasing Global Workspace
AU - Hong, Jinyung
AU - Jeon, Eun Som
AU - Kim, Changhoon
AU - Park, Keun Hee
AU - Nath, Utkarsh
AU - Yang, Yezhou
AU - Turaga, Pavan
AU - Pavlic, Theodore P.
N1 - Publisher Copyright:
© 2024, ML Research Press. All rights reserved.
PY - 2024
Y1 - 2024
N2 - When trained on biased datasets, Deep Neural Networks (DNNs) often make predictions based on attributes derived from features spuriously correlated with target labels. This is especially problematic if these irrelevant features are easier for the model to learn than the truly relevant ones. Many existing debiasing methods have been proposed to address this issue, but they often require predefined bias labels and entail significantly increased computational complexity by incorporating additional auxiliary models. Instead, we provide an orthogonal perspective from existing approaches, inspired by cognitive science, specifically Global Workspace Theory (GWT). Our method, Debiasing Global Workspace (DGW), is a novel debiasing framework that consists of specialized modules and a shared workspace, allowing for increased modularity and improved debiasing performance. Furthermore, DGWimproves the transparency of decision-making processes by visualizing which features of inputs the model focuses on during training and inference through attention masks. We begin by proposing an instantiation of GWT for the debiasing method. We then outline the implementation of each component within DGW. Finally, we validate our method across various biased datasets, proving its effectiveness in mitigating biases and improving model performance.
AB - When trained on biased datasets, Deep Neural Networks (DNNs) often make predictions based on attributes derived from features spuriously correlated with target labels. This is especially problematic if these irrelevant features are easier for the model to learn than the truly relevant ones. Many existing debiasing methods have been proposed to address this issue, but they often require predefined bias labels and entail significantly increased computational complexity by incorporating additional auxiliary models. Instead, we provide an orthogonal perspective from existing approaches, inspired by cognitive science, specifically Global Workspace Theory (GWT). Our method, Debiasing Global Workspace (DGW), is a novel debiasing framework that consists of specialized modules and a shared workspace, allowing for increased modularity and improved debiasing performance. Furthermore, DGWimproves the transparency of decision-making processes by visualizing which features of inputs the model focuses on during training and inference through attention masks. We begin by proposing an instantiation of GWT for the debiasing method. We then outline the implementation of each component within DGW. Finally, we validate our method across various biased datasets, proving its effectiveness in mitigating biases and improving model performance.
UR - https://www.scopus.com/pages/publications/105014725969
M3 - Conference article
AN - SCOPUS:105014725969
SN - 2640-3498
VL - 285
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
T2 - 2nd Edition of the Workshop on Unifying Representations in Neural Models, UniReps 2024
Y2 - 14 December 2024
ER -