A Caltech Library Service

Joint Optimization of Overlapping Phases in MapReduce

Lin, Minghong and Zhang, Li and Wierman, Adam and Tan, Jian (2013) Joint Optimization of Overlapping Phases in MapReduce. Performance Evaluation, 70 (10). pp. 720-735. ISSN 0166-5316.

PDF - Accepted Version
See Usage Policy.


Use this Persistent URL to link to this item:


MapReduce is a scalable parallel computing framework for big data processing. It exhibits multiple processing phases, and thus an efficient job scheduling mechanism is crucial for ensuring efficient resource utilization. There are a variety of scheduling challenges within the MapReduce architecture, and this paper studies the challenges that result from the overlapping of the "map" and "shuffle" phases. We propose a new, general model for this scheduling problem, and validate this model using cluster experiments. Further, we prove that scheduling to minimize average response time in this model is strongly NP-hard in the offline case and that no online algorithm can be constant-competitive. However, we provide two online algorithms that match the performance of the offline optimal when given a slightly faster service rate (i.e., in the resource augmentation framework). Finally, we validate the algorithms using a workload trace from a Google cluster and show that the algorithms are near optimal in practical settings.

Item Type:Article
Related URLs:
URLURL TypeDescription
Additional Information:© 2013 Elsevier B.V. Available online 28 August 2013.
Subject Keywords:MapReduce, Job scheduling, Overlapping tandem queues
Issue or Number:10
Record Number:CaltechAUTHORS:20130930-120231815
Persistent URL:
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:41556
Deposited By: Kristin Buxton
Deposited On:30 Sep 2013 19:31
Last Modified:03 Oct 2019 05:50

Repository Staff Only: item control page