HPC Workload Characterization Using Feature Selection and Clustering

Jiwoo Bang, Chungyong Kim, Kesheng Wu, Alex Sim, Suren Byna, Sunggon Kim, Hyeonsang Eom

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

22 Scopus citations

Abstract

Large high-performance computers (HPC) are expensive tools responsible for supporting thousands of scientific applications. However, it is not easy to determine the best set of configurations for workloads to best utilize the storage and I/O systems. Users typically use the default configurations provided by the system administrators, which typically results in poor performance. In an effort to identify application characteristics more important to I/O performance, we applied several machine learning techniques to characterize these applications. To identify the features that are most relevant to the I/O performance, we evaluate a number of different feature selection methods, e.g., Mutual information regression and F regression, and develop a novel feature selection method based on Min-max mutual information. These feature selection methods allow us to sift through a large set of the real-world workloads collected from NERSC's Cori supercomputer system, and identify the most important features. We employ a number of different clustering algorithms, including KMeans, Gaussian Mixture Model (GMM) and Ward linkage, and measure the cluster quality with Davies Boulder Index (DBI), Silhouette and a new Combined Score developed for this work. The cluster evaluation result shows that the test dataset could be best divided into three clusters, where cluster 1 contains mostly small jobs with operations on standard I/O units, cluster 2 consists of middle size parallel jobs dominated by read operations, and cluster 3 include large parallel jobs with heavy write operations. The cluster characteristics suggest that using parallel I/O library MPI IO and a large number of parallel cores are important to achieve high I/O throughput.

Original languageEnglish
Title of host publicationSNTA 2020 - Proceedings of the 3rd International Workshop on Systems and Network Telemetry and Analytics
PublisherAssociation for Computing Machinery, Inc
Pages33-40
Number of pages8
ISBN (Electronic)9781450379809
DOIs
StatePublished - 23 Jun 2020
Event3rd International Workshop on Systems and Network Telemetry and Analytics, SNTA 2020 - Stockholm, Sweden
Duration: 23 Jun 2020 → …

Publication series

NameSNTA 2020 - Proceedings of the 3rd International Workshop on Systems and Network Telemetry and Analytics

Conference

Conference3rd International Workshop on Systems and Network Telemetry and Analytics, SNTA 2020
Country/TerritorySweden
CityStockholm
Period23/06/20 → …

Keywords

  • clustering
  • feature selection
  • high performance computing
  • supercomputer
  • workload characterization

Fingerprint

Dive into the research topics of 'HPC Workload Characterization Using Feature Selection and Clustering'. Together they form a unique fingerprint.

Cite this