Abstract
The era of big data has come with the development of the SNS and the IoT. In order to discover and utilize value from large amounts of data, analysis of data is essential. ETL should be performed for data analyzing. The ETL has thousands of jobs under a limited system resource and is complexly bundled and scheduled. Since the increase of data capacity by the change of business may cause the perfor-mance delay of the ETL, performance management is required. In this study, we address optimizing ETL by applying Genetic Algorithm which is a type of meta - heuristic. The objective function of the algorithm is to minimize the execution time of the ETL batch and to minimize the load on the server CPU resource. The data for the research is based on the 3 - month average CPU usage and execution time of the 260 ETL jobs being performed in the actual business. The algorithm is repeatedly performed by changing the parameters to obtain optimal results. This study is expected to be an important basis for optimizing the ETL operation of big data system.
| Translated title of the contribution | A Study on Optimization of ETL Batch Job Scheduling using Genetic Algorithm to Provide Data Service at The Right Time |
|---|---|
| Original language | Korean |
| Pages (from-to) | 71-84 |
| Number of pages | 14 |
| Journal | Entrue Journal of Information Technology |
| Volume | 16 |
| Issue number | 2 |
| State | Published - Dec 2017 |