Network-aware multiway join for MapReduce

Kenn Slagter, Ching Hsien Hsu, Yeh Ching Chung, Jong Hyuk Park

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

MapReduce is an effective tool for processing large amounts of data in parallel using a cluster of processors or computers. One common data processing task is the join operation, which combines two or more datasets based on values common to each. In this paper, we present a network aware multi-way join for MapReduce(NAMM) that improves performance by redistributing the workload amongst reducers. NAMM achieves this by redistributing tuples directly between reducers with an intelligent network aware algorithm. We show that our presented technique has significant potential to minimize the time required to join multiple datasets.

Original languageEnglish
Title of host publicationGrid and Pervasive Computing - 8th International Conference, GPC 2013 and Colocated Workshops, Proceedings
Pages73-80
Number of pages8
DOIs
StatePublished - 2013
Event8th International Conference on Grid and Pervasive Computing, GPC 2013 - Seoul, Korea, Republic of
Duration: 9 May 201311 May 2013

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7861 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference8th International Conference on Grid and Pervasive Computing, GPC 2013
Country/TerritoryKorea, Republic of
CitySeoul
Period9/05/1311/05/13

Keywords

  • Hadoop
  • MapReduce
  • Multiway Join
  • Workload Redistribution

Fingerprint

Dive into the research topics of 'Network-aware multiway join for MapReduce'. Together they form a unique fingerprint.

Cite this