Joint Optimization of Overlapping Phases in MapReduce
- Creators
- Lin, Minghong
- Zhang, Li
- Wierman, Adam
- Tan, Jian
Abstract
MapReduce is a scalable parallel computing framework for big data processing. It exhibits multiple processing phases, and thus an efficient job scheduling mechanism is crucial for ensuring efficient resource utilization. There are a variety of scheduling challenges within the MapReduce architecture, and this paper studies the challenges that result from the overlapping of the "map" and "shuffle" phases. We propose a new, general model for this scheduling problem, and validate this model using cluster experiments. Further, we prove that scheduling to minimize average response time in this model is strongly NP-hard in the offline case and that no online algorithm can be constant-competitive. However, we provide two online algorithms that match the performance of the offline optimal when given a slightly faster service rate (i.e., in the resource augmentation framework). Finally, we validate the algorithms using a workload trace from a Google cluster and show that the algorithms are near optimal in practical settings.
Additional Information
© 2013 Elsevier B.V. Available online 28 August 2013.Attached Files
Accepted Version - performance-mapreduce.pdf
Files
Name | Size | Download all |
---|---|---|
md5:a6b5048d4e4d9f62003aa6c83b5bda92
|
820.0 kB | Preview Download |
Additional details
- Eprint ID
- 41556
- Resolver ID
- CaltechAUTHORS:20130930-120231815
- Created
-
2013-09-30Created from EPrint's datestamp field
- Updated
-
2021-11-10Created from EPrint's last_modified field