TY - JOUR
T1 - Normalized class coherence change-based kNN for classification of imbalanced data
AU - Kim, Kyoungok
N1 - Publisher Copyright:
© 2021 Elsevier Ltd
PY - 2021/12
Y1 - 2021/12
N2 - kNN is a widely used machine learning algorithm in many different domains because of its fairly good performance in actual cases and its simplicity. This study aims to enhance the performance of kNN for imbalanced datasets, a topic that has been relatively ignored in kNN research. The proposed kNN algorithm, called normalized class coherence change-based k-nearest neighbor (NCC-NN) algorithm, determines the label of a test sample by computing the normalized class coherence changes at class and sample levels for every possible class and assigning the sample to the class with the maximum value. It considers the tendency that the minority classes usually show the lower-class coherence than the majority class. NCC-kNN also utilizes the adaptive k for the class coherence, which is calculated in a weighted manner to reduce the sensitivity to the selection of k. NCC-kNN was applied to 20 benchmark datasets with varying class imbalance and coherence, and its performance was compared with that of five kNN algorithms, SMOTE and MetaCost with standard kNN as a base classifier. The proposed NCC-kNN outperformed the other kNN algorithms in classification of imbalanced data, especially for imbalanced data with low positive class coherence.
AB - kNN is a widely used machine learning algorithm in many different domains because of its fairly good performance in actual cases and its simplicity. This study aims to enhance the performance of kNN for imbalanced datasets, a topic that has been relatively ignored in kNN research. The proposed kNN algorithm, called normalized class coherence change-based k-nearest neighbor (NCC-NN) algorithm, determines the label of a test sample by computing the normalized class coherence changes at class and sample levels for every possible class and assigning the sample to the class with the maximum value. It considers the tendency that the minority classes usually show the lower-class coherence than the majority class. NCC-kNN also utilizes the adaptive k for the class coherence, which is calculated in a weighted manner to reduce the sensitivity to the selection of k. NCC-kNN was applied to 20 benchmark datasets with varying class imbalance and coherence, and its performance was compared with that of five kNN algorithms, SMOTE and MetaCost with standard kNN as a base classifier. The proposed NCC-kNN outperformed the other kNN algorithms in classification of imbalanced data, especially for imbalanced data with low positive class coherence.
KW - Class coherence
KW - Imbalanced data
KW - Nearest neighbor classification
KW - kNN
UR - http://www.scopus.com/inward/record.url?scp=85111006834&partnerID=8YFLogxK
U2 - 10.1016/j.patcog.2021.108126
DO - 10.1016/j.patcog.2021.108126
M3 - Article
AN - SCOPUS:85111006834
SN - 0031-3203
VL - 120
JO - Pattern Recognition
JF - Pattern Recognition
M1 - 108126
ER -