TY - JOUR
T1 - A study on evaluation measures for unsupervised outlier detection
AU - La, Sunmin
AU - Cho, Nam Wook
N1 - Publisher Copyright:
© 2020 ICIC International. All rights reserved.
PY - 2020
Y1 - 2020
N2 - Outlier detection is a data analysis method based on data mining techniques and is used to identify outlying observations which might have significance in a dataset. Research on outlier detection, however, has mainly focused on supervised approaches, which require labeled training and test datasets. Unsupervised approaches are more appropriate for many applications such as network intrusion detection and fraud detection, but the suitability of these methods to determine the degree of outlierness of a dataset has not been fully addressed because the ground truth is usually unavailable. In this paper, evaluation measures for unsupervised outlier detection, which can effectively measure the outlierness of a dataset, are proposed. To verify the effectiveness of the proposed methods, experiments were conducted with University of California Irvine machine learning datasets using a k-nearest neighbors (k-NN) algorithm.
AB - Outlier detection is a data analysis method based on data mining techniques and is used to identify outlying observations which might have significance in a dataset. Research on outlier detection, however, has mainly focused on supervised approaches, which require labeled training and test datasets. Unsupervised approaches are more appropriate for many applications such as network intrusion detection and fraud detection, but the suitability of these methods to determine the degree of outlierness of a dataset has not been fully addressed because the ground truth is usually unavailable. In this paper, evaluation measures for unsupervised outlier detection, which can effectively measure the outlierness of a dataset, are proposed. To verify the effectiveness of the proposed methods, experiments were conducted with University of California Irvine machine learning datasets using a k-nearest neighbors (k-NN) algorithm.
KW - External measure
KW - Gini index
KW - K-nearest neighbors (k-NN)
KW - Outlierness
KW - Unsupervised outlier detection
UR - http://www.scopus.com/inward/record.url?scp=85082688018&partnerID=8YFLogxK
U2 - 10.24507/icicel.14.05.515
DO - 10.24507/icicel.14.05.515
M3 - Article
AN - SCOPUS:85082688018
SN - 1881-803X
VL - 14
SP - 515
EP - 520
JO - ICIC Express Letters
JF - ICIC Express Letters
IS - 5
ER -