Hybrid data stream clustering by controlling decision error

Jeonghwa Lee, Taek Ho Lee, Chi Hyuck Jun

Research output: Contribution to journalArticlepeer-review

Abstract

Data stream clustering is an unsupervised learning method for sequential data. Data stream clustering has some challenging issues, such as handling limited memory, dealing with evolving clusters, and detecting noise data. We propose a hybrid data stream clustering method that combines model-based clustering and density-based clustering. The proposed method finds evolving clusters quickly and obtains cluster information easily. We use multiple hypothesis testing to handle noise data by controlling a decision error. In this testing method, we employ the positive false discovery rate as the decision error. We use a density-based algorithm to discover cluster evolution from newly arrived data. Then, we estimate a Gaussian mixture model and update the clustering results by combining past cluster information and the cluster information for newly arrived data. We applied the proposed method to several synthetic and real datasets. The experimental results demonstrate that the proposed method works effectively for a data stream that includes noise data. In addition, the proposed method yields robust results relative to input parameters compared to an existing density-based data stream clustering method.

Original languageEnglish
Pages (from-to)717-732
Number of pages16
JournalIntelligent Data Analysis
Volume23
Issue number3
DOIs
StatePublished - 2019

Keywords

  • False discovery rate
  • Gaussian mixture
  • multiple testing
  • noise data

Fingerprint

Dive into the research topics of 'Hybrid data stream clustering by controlling decision error'. Together they form a unique fingerprint.

Cite this