A weighted k-modes clustering using new weighting method based on within-cluster and between-cluster impurity measures

Research output: Contribution to journalArticlepeer-review

9 Scopus citations

Abstract

Partitioning a set of objects into groups or clusters is a fundamental task in data mining, and clustering is a popular approach to implementing partitioning. Among several clustering algorithms, the k-means algorithm is well-known and widely applied in several areas that only handle numerical attributes. The k-modes algorithm is an extension of the k-means algorithm that deals with categorical variables, which has several variations such as fuzzy methods. This paper presents a new attribute weighting method for the k-modes algorithm that utilizes impurity measures such as entropy and Gini impurity. The proposed algorithm considers both the distribution of categories of attributes within the same cluster and between different clusters. By doing this, categorical variables defined as more important that others by the new algorithm have a significant influence on the similarity calculation, and this results in improved clustering performance, which was confirmed by experiments.

Original languageEnglish
Pages (from-to)979-990
Number of pages12
JournalJournal of Intelligent and Fuzzy Systems
Volume32
Issue number1
DOIs
StatePublished - 2017

Keywords

  • fuzzy k-modes clustering
  • fuzzy weighted k-modes clustering
  • k-modes clustering
  • weighted k-modes clustering

Fingerprint

Dive into the research topics of 'A weighted k-modes clustering using new weighting method based on within-cluster and between-cluster impurity measures'. Together they form a unique fingerprint.

Cite this