TY - JOUR
T1 - A weighted k-modes clustering using new weighting method based on within-cluster and between-cluster impurity measures
AU - Kim, Kyoungok
N1 - Publisher Copyright:
© 2017 - IOS Press and the authors.
PY - 2017
Y1 - 2017
N2 - Partitioning a set of objects into groups or clusters is a fundamental task in data mining, and clustering is a popular approach to implementing partitioning. Among several clustering algorithms, the k-means algorithm is well-known and widely applied in several areas that only handle numerical attributes. The k-modes algorithm is an extension of the k-means algorithm that deals with categorical variables, which has several variations such as fuzzy methods. This paper presents a new attribute weighting method for the k-modes algorithm that utilizes impurity measures such as entropy and Gini impurity. The proposed algorithm considers both the distribution of categories of attributes within the same cluster and between different clusters. By doing this, categorical variables defined as more important that others by the new algorithm have a significant influence on the similarity calculation, and this results in improved clustering performance, which was confirmed by experiments.
AB - Partitioning a set of objects into groups or clusters is a fundamental task in data mining, and clustering is a popular approach to implementing partitioning. Among several clustering algorithms, the k-means algorithm is well-known and widely applied in several areas that only handle numerical attributes. The k-modes algorithm is an extension of the k-means algorithm that deals with categorical variables, which has several variations such as fuzzy methods. This paper presents a new attribute weighting method for the k-modes algorithm that utilizes impurity measures such as entropy and Gini impurity. The proposed algorithm considers both the distribution of categories of attributes within the same cluster and between different clusters. By doing this, categorical variables defined as more important that others by the new algorithm have a significant influence on the similarity calculation, and this results in improved clustering performance, which was confirmed by experiments.
KW - fuzzy k-modes clustering
KW - fuzzy weighted k-modes clustering
KW - k-modes clustering
KW - weighted k-modes clustering
UR - http://www.scopus.com/inward/record.url?scp=85009962518&partnerID=8YFLogxK
U2 - 10.3233/JIFS-16157
DO - 10.3233/JIFS-16157
M3 - Article
AN - SCOPUS:85009962518
SN - 1064-1246
VL - 32
SP - 979
EP - 990
JO - Journal of Intelligent and Fuzzy Systems
JF - Journal of Intelligent and Fuzzy Systems
IS - 1
ER -