Performance evaluation of spatial data management systems using geospark

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

In this paper, we evaluate the performance of spatial data management systems in distributed computing environments. Given that GeoSpark outperforms other spatial systems in many scenarios as reported in several studies, we choose spatial data management systems using GeoSpark for this evaluation. Even though GeoSpark supports various storage engines as its underlying data store, the effects of the storage engines for spatial data processing have not been well studied. To address this limitation, we evaluate the performance of GeoSpark using two underlying data stores: 1) HDFS and 2) MongoDB. We first design and build distributed experimental environments based on Amazon EC2 and EMR using up to 10 nodes. Through the extensive experiments on three synthetic and real data sets, we show that the overall performance of both HDFS-and MongoDB-based GeoSpark improves as we increase the number of nodes. We also show that HDFS-based GeoSpark generally outperforms MongoDB-based GeoSpark, especially for large-scale data sets. In addition, we demonstrate that the proper use of caching on HDFS-based GeoSpark can improve the overall query processing performance by up to three orders of magnitude.

Original languageEnglish
Title of host publicationProceedings - 2020 IEEE International Conference on Big Data and Smart Computing, BigComp 2020
EditorsWookey Lee, Luonan Chen, Yang-Sae Moon, Julien Bourgeois, Mehdi Bennis, Yu-Feng Li, Young-Guk Ha, Hyuk-Yoon Kwon, Alfredo Cuzzocrea
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages197-200
Number of pages4
ISBN (Electronic)9781728160344
DOIs
StatePublished - Feb 2020
Event2020 IEEE International Conference on Big Data and Smart Computing, BigComp 2020 - Busan, Korea, Republic of
Duration: 19 Feb 202022 Feb 2020

Publication series

NameProceedings - 2020 IEEE International Conference on Big Data and Smart Computing, BigComp 2020

Conference

Conference2020 IEEE International Conference on Big Data and Smart Computing, BigComp 2020
Country/TerritoryKorea, Republic of
CityBusan
Period19/02/2022/02/20

Keywords

  • Distributed environments
  • GeoSpark
  • Large-scale spatial data
  • Performance evaluation

Fingerprint

Dive into the research topics of 'Performance evaluation of spatial data management systems using geospark'. Together they form a unique fingerprint.

Cite this