Abstract
Data stream clustering is an unsupervised learning method for sequential data. Data stream clustering has some challenging issues, such as handling limited memory, dealing with evolving clusters, and detecting noise data. We propose a hybrid data stream clustering method that combines model-based clustering and density-based clustering. The proposed method finds evolving clusters quickly and obtains cluster information easily. We use multiple hypothesis testing to handle noise data by controlling a decision error. In this testing method, we employ the positive false discovery rate as the decision error. We use a density-based algorithm to discover cluster evolution from newly arrived data. Then, we estimate a Gaussian mixture model and update the clustering results by combining past cluster information and the cluster information for newly arrived data. We applied the proposed method to several synthetic and real datasets. The experimental results demonstrate that the proposed method works effectively for a data stream that includes noise data. In addition, the proposed method yields robust results relative to input parameters compared to an existing density-based data stream clustering method.
Original language | English |
---|---|
Pages (from-to) | 717-732 |
Number of pages | 16 |
Journal | Intelligent Data Analysis |
Volume | 23 |
Issue number | 3 |
DOIs | |
State | Published - 2019 |
Keywords
- False discovery rate
- Gaussian mixture
- multiple testing
- noise data